APPENDIX A 




The following sections of code accomplish two tasks: 

I) Calculation of the topomeric conformation for a particular 
molecule, assuming that the molecule is referenced by a particular 
row of a Tripos Molecular Spreadsheet (MSS). With minor 
adaptations this code could be used in other molecular modeling 
environments, such as Cerius 2, Quanta, or Insight. 

II) Calculation of the lina slope assuming that the biological 
data and one or more columns of property data are stored in a Tripos 
Molecular Spreadsheet (MSS). Almost any other software for 
manipulating data in a spreadsheet or other tabular representation 
could be adapted to perform similar calculations, assuming a 
Tanimoto function for expressing "distances" between bitsets of equal 
cardinality. 

Both sections of code include procedures written in two languages. 
The first is C, familiar to all programmers, and includes both all 
specialized structure declarations and also brief explanations of all 
functions used. The second is SPL, an interpretative language 
available within the SYBYL molecular modeling program, whose 
syntax is similar to a Unix shell script. The SPL language is described 
fully in the volume entitled SPL Manual , found within the 
documentation set for SYBYL 6.2, release date July 1995. This volume 
includes descriptions of all "expression generators" (functions 
returning a value) and "macro commands" not specifically explained 
below. 

I. Topomeric Field Code: 

A. SPL macro CHOM/BUILD3D. To build topomeric ally aligned 
3D models, the third argument must have the value ALIGN, and the 
global associative array element CHOM!Align[ALICYC] must have 
the value All_trans. Code to allow user adjustment of these and 
other 3D model-building parameters appearing in this code as other 
elements of CHOM!Align[] is not shown. 

B. Under these circumstances the following SPL macro 
GHOM'Alltrans sets all torsions provided to their topomeric values. 

C. To determine the atoms defining each torsion to be adjusted, 
CHOM'.Alltrans invokes the expression generator %trans_path(), 
which executes the following C subroutine SYB_MGEN_CONN_BEST, 
with its associated subroutines syb_mgen_conn_att_atoms, 
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get_path_mw, get_path_xyz, and (if debugging) ashow. No user- 
adjustable values are used by this code. All non-obvious include files 
and a brief functional description of subroutines external to this code 
are provided in section III below. 

D. The computation of rotatable-bond-attenuated steric (and/or 
electrostatic, hydrogen bonding) fields for the topomerically aligned 
conformation is carried out by the C subroutine 
QSAR_FIELD_EVAL_RB_A TTEN, which uses the accompanying 
subroutine QSA R_FIELD_RB_ WTS to generate an attenuated weight 
for each atom's contribution to the field(s). (Pseudo code for the 
latter subroutine appears in its header comment.) The attenuation 
factor (recommended value of 0.85) is a user-adjustable or 
"tailorable" value, here shown as COMFA! AGGREG_SCALING. The 
user-adjustable HBOND_RAD_SCALING parameter affects the steric 
"radius" of a hydrogen-bonding hydrogen. 

II. Patterson-Distribution Validation Code 

A. The SPL expression generator irtjast returns the slope of 
the "best" line along with the count of data points and the 
fractional area, within a "virtual" or conceptual graph of absolute 
differences in biological activities vs absolute differences in the 
diversity measurement to be validated. The format of its output 
appears in the header comment. 

B. The short SPL expression generator dochi shows the 
computation ' of the chi-squared statistic resulting from the output of 
the lrt_fast expression generator. 

C. The C code functions QSHELL_HIER_LRT, 
QSHELL_HIER_DO_LRT, and fptjieapsort generate the results 
produced by lrt_fast. These routines generate the biological 
differences themselves but rely on some external procedure, not 
shown, to generate the distances between the diversity 
measurements. (The reason is that the method of calculating 
differences depends on the diversity parameter(s). Typically a 
Euclidean distance is calculated for scalar properties, or a Tanimoto 
difference is calculated for bitsets, and if multiple parameters are 
combined to form the diversity measurement to be validated then 
the relative weighting must also be specified by the user.) 

Section III. Supporting information for interpretation of the C code in 
Sections I and II. 
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A. Declarations of complex and non-standard data structures 
referenced by the declarations within these C procedures, specifically 
for molecules, atoms, and the regions, fields, and other user input 
information that are part of a CoMFA field description. 

B. Functional descriptions of all external subroutines called by 
these C procedures, ordered alphabetically. 
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# 

# SECTION I -A. Macro BU:TCD_3D for generating and storing topomeric alignments 
# 

©macro BUILD_3D CHOM 

# builds 3D models, 

# storage in a database or in a conformer column 
# 

# either not -aligned (just uses Concord or as- is if from Unity, 

# or minimizes input structure) 

# or aligned for CoMFA (requires core structure as alignment template) 

# with optional fixup of side chains, charge calculation 

# $1 is row ids in current MSS 

# $2 is storage code (will retrieve structure from same place or somewhere) 

# $3 is align (U or A) 

# $4 is basic building technique 

# other arguments, used only if ALIGN is true, are elements 

# of the global associative array CHOM! ALIGN 

# set up mol retrieval from MSS to be fast and clean 

localvar AFFECT_SUBSET_save 

locpg Ivar EXAMINE_TAILOR_MODE_save 

locglvar HIGHLIGHT_MSS_save 

lodglvar INFORM_save 

looglvar INPUT_MODE_save 

loealvar RELATE_save 

locglvar SHOW_MOLECULE_save 

loqglvar USER_FUNCTION_save natmcore heavy ys 
loqalvar align ma rid cgq_save tailor_bumps_save newc \ 
I a b max_save usehs rat yrat nrat noth 

setjjfar AFFECT_SUBSET_save $ TAILOR ! EXAMINE ! AFFECT_SUBSET 
sei^ar EXAMINE_TAILOR_MODE_save $ TAILOR ! EXAMINE ! EXAMINE JTAILOR_MODE 

setvar HIGHLIGHT_MSS_save $ TAILOR ! EXAMINE ! HIGHL IGHT_MS S 

sel^ar INFORM_save $TAILOR ! EXAMINE ! INFORM 

setfvar INPUT_MODE_save $ TAILOR ! EXAMINE ! INPUT_MODE 

setHcir RELATE_save $ TAILOR ! EXAMINE ! RELATE 

setvar SHOW_MOLECULE_save $ TAILOR ! EXAMINE ! SHOW_MOLECULE 

setvar USER_FUNCTION_save $ TAILOR ! EXAMINE ! USER_FUNCTION 
setvar cgq_save $CGQ_TIMEOUT 
set CGQ timeout 0 



setvar TAILOR ! EXAMINE ! AFFECT_SUBSET NONE 

setvar TAILOR ! EXAMINE ! EXAMINE_TAILOR_MODE SILENT 

setvar TAILOR! EXAMINE ! HIGHLIGHT jMSS NO 

setvar TAILOR! EXAMINE ! INFORM NO 

setvar TAILOR ! EXAMINE ! INPUTJMODE ROW_COLUMN_EXPR 

setvar TAILOR! EXAMINE ! RELATE NO 

setvar TAILOR ! EXAMINE ! SHOW_MOLECULE YES 

setvar TAILOR ! EXAMINE ! USER FUNCTION * s NONE 



setvar max^save $TAIL0R!MAXIMIN2 ! LS_STEP_SIZE $TAILOR JMAXIMIN2 !MAXIMUM_ITERATION 

setvar ma %table__attribute ( MOL_AREA ) 

# if needed make new place to put output 
setvar newc 

switch %substr( $213) 
case NEW) 

setvar newc %math( %table ( * COL COUNT ) + l ) 
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ti^^ONF $newc ) ^) 



table column sin %cati^PONF $newc ) 
case SYB) 

database open %qspr_table_db ( %table_de fault () ) update 
table ATTRIBUTE SET CONFORMER 0 

case ) 

setvar newc %substr( $2 1 %math( %pos ( _ $2 ) -1.) 
TABLE CONFORMER $newc 

endswitch 



if %streql( %substr( $311) "A" ) 

# are we bump checking ? 
if $CHOM! Align [BUMPS] 

setvar tailor_bumps_save $TAILOR! GENERAL !bumps_contact_distance 
tailor set general bumps_contact_distance %math( $CHOM! Align [BUMPS] - 1.0 ) 
endif 

## 

# STEP 1: prepare template fragment 
#.# 

setvar mcore $CHOM! Align [ MCORE ] 

# save original template 
^itvar mcsav %molempty() 
cBpy $mcore $mcsav 

dgfault $mcore >$nulldev 
±¥ $CHOM! Align [DEBUG] 
5 ^ label id * 
ftndif 

Setvar capsln %cat( %sln( $mcore ) ) 
^setvar natcore %mol_info( $mcore NATOMS ) 

# If the alignment template has just one free valence, 

# rfiike geometrically acceptable template by adding heavy atoms, minimizing 
#; 40.se use as is 

Cletvar heavy TRUE 
* L=f illvalence *-H* Hal >$nulldev 

if %gt( %math( %mol_inf o ( $mcore NATOMS ) - $natcore ) 1 ) 
copy $mcsav $mcore 
setvar heavy 

endif 
if $heavy 
for a in %atoms (<H*>- <H>) 

modify atom type $a C.3 >$nulldev 
modify atom name $a XI >$nulldev 
endfor 
endif 

TAILOR SET MAXIMIN2 LS_STEP_SIZE 0.0001 MAXIMUM_ITERATIONS 1000 | | 
MAXIMIN $mcore DONE INTERACTIVE >$nulldev ^ 

if $ heavy 

for a in %atoms(Xl) 

modify atom type $a HEV >$nulldev 

# must rename it ! ! 

modify atom name $a XI >$nulldev 
endfor 

setvar ys %set_create( %atoms(Xl) ) 
# { orient template so that an R points in the positive X direction 



setvar rat %arg( 1 %^^_unpack( $ys ) ) 

setvar nrat %arg( 1 %atom_info( $rat NEIGHBORS ) ) 
setvar yrat %arg( 1 %set_unpack ( %set_diff ( \ 

%set_create( %atom_info( $nrat NEIGHBORS ) ) $rat ) ) ) 
ORIENT USER $nrat $rat $yrat >$nulldev 
end if 

# identify all the non-primary atoms for FIT, in/out of the search pattern 

# and all the basic torsions (bonds to Ys) that potentially need setting 

setvar tpat %arg( 1 %search2d( %cat ( %sln( $mcore ) ) $capsln NoDup 0 y ) ) 
setvar hvinpat 
setvar patats 
setvar tors 
setvar usehs 

setvar sybhvats %set — create (%atoms (*-<H>) ) 
if %lt( %set_size( $sybhvats ) 3 ) 
setvar usehs TRUE 

setvar sybhvats %set_create (%atoms (*) ) 

end if 

for a in %range(l %sln_atom__ count ( $capsln ) ) 

if %or( "$usehs" "%not( %set_and( %sin_atom_symbol ( $capsln $a ) \ 
H,F,Cl,Br,I ) ) « ) 

# fer FIT, need to know the SYBYL IDs of the heavy atoms 

r\ setvar hvinpat $hvinpat $a 

,T setvar patats [ $a ] %sln_rgroup_sybid ( $mcore $tpat $a ) 
: setvar patats [ $a ] [ YS ] %set_and( "$ys" "%set_create ( \ 

; ^! %atom_info( $patats [ $a ] NEIGHBORS ) )" ) 

# for each torsion root, need to save the SLN ID of an arbitrary 

# [~ heavy atom torsional definer 
* A if $patats [ $a ] [ YS ] 

s _ setvar tors [ $a ] %set_and( %set_diff ( "%set_create ( \ 

%ai-4m info( $patats [ $a ] NEIGHBORS ) ) " $patats [ $a ] [ YS ] ) $sybhvats ) 

rU ~ 

# JI there are several possibilities, prefer the lowest #'d carbon 

# in to define trans -ness 

Q if %gt( %set_size( $tors [ $a ] ) 1 ) 

U if %set_and( $tors [ $a ] %set_create( %atoms(<C*>) ) ) 

setvar tors [ $a ] %set_and( $tors [ $a ] \ 
%set_create ( %atoms(<C*>) ) ) 

end if 

setvar tors [ $a ] .%arg( 1 %set_unpack( $tors [ $a ] ) ) 

endif 

for al in %range(l %sln_atom_count ( $capsln ) ) 

if %eq( $tors [ $a ] %sln_rgroup_sybid ( $mcore $tpat $al ) ) 
setvar tors[$a] $al 
break 
endif 
j endfor 
I endif 
endif 
endfor 
if $CHOM! Align [DEBUG] 
;echo %prompt( INT l " " " " ) 
endif 
endif 

default $ma >$nulldev 
setvar CHOM! BadRows 
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# build 3D models 
## 

# off we go ! ! Get MSS row IDS to build models for 
if %streql( $1 * ) 

setvar rids %table( * ROW NUM ) 
else 

setvar rids %set_unpack( $1 ) 
endif 



for rid in $rids 



# get the next MSS entry to be modelled 
table examine $rid | | >$nulldev 

#* fix N02's (egad what a pain) because Concord & SYBYL are inconsistent 
\ setvar pat %search2d( %sln( $ma ) N(=0)0 ALL 0 y ) 
: while $pat 

setvar pat %sln_rgroup_sybid ( $ma %arg ( 1 $pat ) 1 3 ) 
modify bond type %bonds ( %cat ( %arg ( 1 $pat ) \ 

%arg( 2 $pat ) ) ) 2 >$nulldev 
modify atom type %arg ( 2 $pat ) o.2 
m setvar pat %search2d( %sln( $ma ) N(=0)0 ALL 0 y ) 

eniiwhile 

vy 

iJ?l $CHOM! Align [DEBUG] 

:j_abel id * 
ejijiif 

# bpfsic optimization 
ss^Ltch $4 

case CONCORD) 

£=CONCORD MOL $ma >$nulldev 

# ilU Concord failed, we may still be awfully flat 

# mGiimize if there are heavy atoms not part of a single aromatic system 

ysetvar noth %atoms ( *-<H> ) 
Setvar al %arg ( 1 $noth ) 

Mif %set_diff( "%set_create ( $noth ) " \ 

"%set_create( %atoms ( %cat ( " {aromatic ( » M $al" ")}" ) ) ) n ) 
setvar zs %extent_3d( %cat ( $ma " (*) " ) 
setvar zs %math( %arg ( 5 $zs ) - %arg ( 6 $zs ) ) 
if %eq( $zs 0.0 ) 

%unflatten( %cat ( $ma " (*) " ) ) 
MAXIMIN $ma DONE INTERACTIVE 
endif 
endif 

case MINIMIZE) 

MAXIMIN $ma DONE INTERACTIVE >$nulldev 

; endswitch 

# done, if only 3d coord, but for topomeric CoMFA 
if %streql( %substr( $311) "A" ) 

# : find any arbitrary 2D hit 

setvar pat %search2d( %cat ( %sln( $ma ) ) $capsln NoDup 0 y ) 
if %not ( $pat ) 

setvar CHOMIBadRows %set_or( "$CHOM!BadRows" $rid ) 
echo $capsln not found in molecule for Row $rid . . skipping 
• goto nextl 
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endif 

setvar pat %arg(l $pat ) 

setvar allpatats %set_create( %sln_rgroup_sybid ( $ma $pat \ 
%range( 1 %sln_aton\_count ( $capsln ) ) ) ) 

#. collect all appropriate heavy atoms for FIT and torsions 
setvar matl 
setvar mat2 
setvar schns 
for a in $hvinpat 

setvar matl $matl $patats [ $a ] 
setvar sybat %sln_rgroup_sybid ( $ma $pat $a ) 
setvar mat2 $mat2 $sybat 
# are there heavy atom neighbors to FIT also (and generate torsion lists)? 
if $patats [$a] [YS] 

setvar ans %set_dif f ( %set_create( \ 

%atom_info( $sybat NEIGHBORS ) ) $allpatats ) 
setvar ans %atoms ($ans-<H>) 
setvar i 1 

for p in %set_unpack( $patats[$a] [YS] ) 
#. add heavy atom neighbors to FIT list 
if %arg( $i $ans ) 
f ~ setvar matl $matl $p 

*~ setvar mat2 $mat2 %arg ( $i $ans ) 

f jp'nerate another torsion for CHOM! all trans 

vjl setvar schns $schns %cat ( $sybat " , " \ 

%sliLrgroup_sybid( $ma $pat $tors [ $a ] ) " , " %arg{ $i $ans ) ) 

*2 endif 

f = setvar i %math ( $i + 1 ) 

endfor 
!1 endif 
* ^endfor 

Msetvar dofit MATCH %cat ( $mcore "(" %set_create ( $matl ) ") " ) \ 

In %cat( $ma " ( " %set_create ( $mat2 ) ")" ) 

O$dofit >$nulldev 
if j$CHOM! Align [DEBUG] 

echo %prompt( INT 1 " » " " ) 
endif 



# do FIT 

if %gt( $MATCH_RMS $CHOM!Align[ FITRMS ] ) 

setvar CHOMIBadRows %set_or( "$CHOM!BadRows" $rid ) 

echo Bad geometric alignment (MATCH_RMS = $MATCH_RMS) for Row $rid . . sk 
goto nextl 
endif 

# side chain alignments . . 

I switch $CHOM!Align[ ALICYC ] 
case User_Macro) 

$CHOM!Align[ ALIDATA ] $ma $CHOM! ALIGN [ MCORE ] 

case All_trans) 
case With_Templates) 

setvar noj rings TRUE 

setvar rbds %set_create( %bonds ( {rings ()} ) ) 
for i in $schns 

setvar jbds %set_unpack( $i ) 

# can set "side chain" bonds only if connecting bond is not cyclic 

if %set_and( "$rbds" "%bonds( %cat( %arg( 3 $jbds ) = \ 
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lrg( 1 $jbds ) ) ) » ) 
setvar noj rings 

else 

CHOMiAllTrans $jbds 

endif 
endf or 
if $CHOM! Align [DEBUG] 

echo %prompt( INT 1 " " " " ) 
endif 

if %streql( $CHOM!Align[ ALICYC ] WithJTemplates ) 
setvar f %open( $CHOM! Align [ ALIDATA ] "r M ) 
setvar buff %read( $f ) 
setvar slnma %cat ( %sln( $ma ) ) 
while $buff 

# each line of text should have pattern, SLN IDs for the 4 torsion atoms, 
# ; and a torsion value to set 

if %eg( %count( $buff ) 5 ) 

setvar torpat %search2d( $slnma %arg( 1 $buff ) NoDup 0 y ) 
for t in $torpat 

MODIFY TORSION %sln_rgroup_sybid ( $ma $t %arg ( 2 $buff ) \ 
%arg( 3 $buff ) %arg( 4 $buff ) ) %arg( 5 $buff ) >$nulldev 

o endfor 
~jj endif 
Q endwhile 
C\ %close ( $f ) 

■ ?r endif 



ft ' : 



^endswitch 
endif 

t do a bump check? 
: M $CHOM! Align [BUMPS] 

^ if %atoms ( {bumps (*,*) }) 

W setvar CHOMIBadRows %set_or( "$CHOM!BadRows" $rid ) 

echo Bad steric contacts in aligned conf ormer for Row $rid . . skipping 
' Q goto nextl 

f= endif 
. endif 

#. partial charges . . 

switch $CHOM! Align [ CHARGE ] 
case None) 

case User_Macro) 

exec $CHOM!Align[ CHARGEDATA ] $ma 

r ) 

case ) 

CHARGE $ma COMPUTE $CHOM! Align [ CHARGE ] | >$nulldev 
endswitch 

# put conformer away 
. switch %substr( $213) 
case SYB) 

database add $ma r >$nulldev 

case ) 

%wcell( $rid $newc %cat ( %cat ( %sln( $ma FULL CHARGE ) ) ) ) >$nulldev 

; endswitch 

j 
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echo Built row $rid 
nextl : 
endf or 

if %streql( %substr( $311) "A" ) 

copy $mcsav $mcore 

zap $mcsav 
endif 

if $CHOM! Align [BUMPS] 

TAILOR SET GENERAL bumps_ contact_distance $tailor__bumps_save | | 

endif 

# done, restore initial EXAMINE settings 
set CGQJTIMEOUT $cgq_save 

setvar TAILOR ! EXAMINE ! AFFECT_SUBSET $AFFECT_SUBSET_save 
setvar TAILOR ! EXAMINE ! EXAMINE_TAILOR_MODE $EXAMINE_TAILOR_MODE_save 
setvar TAILOR ! EXAMINE ! H I GHL I GHT_MS S $HIGHLIGHT_MSS_save 
setvar TAILOR ! EXAMINE ! INFORM $INFORM_save 
setvar TAILOR ! EXAMINE ! INPUT_MODE $INPUT_MODE_save 
setvar TAILOR ! EXAMINE ! RELATE $ RE LATE_s a ve 

setvar TAILOR ! EXAMINE ! SHOW_MQLECULE $SHOW_MOLECULE_save 
setvar TAILOR ! EXAMINE ! USER_FUNCTION $USER_FUNCTION_save 
TAILOR SET MAXIMIN2 LS_STEP_SIZE %arg ( 1 $max_save ) \ 
J MAXIMUM_ITERATIONS %arg ( 2 $max_save ) | | 

# i|gdate row and column information 
if ffstreqK %substr( $213) NEW ) 

# irf£fke any new conformer column become the source of molecules 

^TABLE CONF %table ( * COL COUNT ) 
^HOM ! UPDATE_ROW_SEL $ CHOM ! CID_Las t 
Wetvar CHOM!CID_Last %math( $CHOM! CID_Last + 1 ) 
elds 

ISHOM ! UPDATE_ROW_SEL 
enciflf 



# Section I-B. Generates the topomeric conformation of the 3D model 
# 

#= = = = ^ = = = = = = = = = = = = = = = = = == = = = = = = = : = = = = = = == = = = = ^ = = = = = = = = =: = = = = = = = == = = = = = = = = =:==== = = == = = : 

©macro ALLTRANS chom 

# assumes default molecule, takes argument atoms $1 and $2 

ft where $1 is the JOINed atom of the core, $2 is the atom that 
#' the rest of the substituent is to be trans to, 

# and $3 is the JOINed atom of the substituent 
# t starts from that atom and sets all side chains 
#! to a topomeric conformation 

lpcalvar bds b bdset al a2 tmp sbonds sats rbond pbds torsion ringbonds doit 

# check input for legality 

setvar tmp %set_create( %atom_info{ $1 NEIGHBORS ) ) 
if %not( %eq( 2 %count ( %set_unpack( %set_and( \ 
"$tmp» %cat ( $2 " $3 ) ) ) ) ) ) 

echo Bad input to ALLTRANS (atoms $2 $3 not bonded to $1) 

return 
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t I 

endif 

# save key bonds 

setvar rbond %bonds ( %cat ( $3 $1 ) ) 

setvar sats %conn_atoms ( $3 $1 ) 
if %not( $sats ) 

# echo No substituent atoms found in ALLTRANS 
return 

endif 

setvar sats $3 $sats 

setvar sbonds %set_create ( %bonds ( \ 

%cat( n {TO_ATOMS(" %set_create($sats) ")}" )) ) 

# define the other bonds that might need adjusting 

setvar bds %set_create( %bonds ( ( * - { RINGS ()})&<!> ) ) 
■ setvar bds %set_and( "$sbonds" "$bds H ) 
if %not( $bds ) 

return 
endif 

# discard bonds to primary atoms 

setvar mval %set_create( %atoms ( \ 

<H>+<o. 2>+<F>+<I>+<Cl>+<Br>+<n. 1>+<LP>+<Du> ) ) 
ggtvar pds %set_create ( %bonds (. %cat ( " { TO_ATOMS ( " $mval ")}" ) ) ) 
gfetvar bds %set_dif f ( $bds $pds ) 
"setvar ringbonds %set_create (%bonds ( {RINGS ( ) } ) ) 

# wgik all the important bonds 
f orj b in %set_unpack( $bds ) 

Msetvar doit TRUE 

# if* this is the JOIN bond, already have some info 

[ if %eq( $b $rbond ) 
n setvar aO $2 
fQ setvar al $1 
U s setvar a2 $3 

# siri.ll need to be SURE we're not monovalent 

%1 if %or( "%eq( 1 %count ( %atom_info( $al NEIGHBORS ) ) ) " \ 
If "%eq( 1 %count( %atom_info( $a2 NEIGHBORS ) ) ) " ) 
f " setvar doit 
endif 
else 

setvar bdat %bond_info( $b ORIGIN TARGET ) 
setvar al %arg( 1 $bdat ) 
setvar a2 %arg ( 2 $bdat ) 

if %or( »%eq( 1 %count ( %atom_info( $al NEIGHBORS ) ) )" \ 
"%eq( 1 %count( %atom_info( $a2 NEIGHBORS ) ) )" ) 
setvar doit 
endif 
if $doit 

# which end leads to root atom? if necessary flip al,a2 to make that one be al 

if %set_and{ "%set_create ( %conn_atoms ( $a2 $al ) ) " $1 ) 

setvar tmp $al 

setvar al $a2 

setvar a2 $tmp 
endif 

setvar aO %trans_path( $al $a2 $1 ) 
endif 
endif 
if $doit 

setvar a3 %transjpath( $a2 $al ) 
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switch %count ( %s<^^mpack( "%set_and( "$ringt^fcs" \ 

%set_create( %bonds ( %cat ( $aO " = " $al »," $a2 " = " $a3 ))))")) 

case 0) 

setvar torsion 180 

case 1) 

setvar torsion 90 

case 2) 

setvar torsion 60 
endswitch 

modify torsion $a0 $al $a2 $a3 $torsion >$nulldev 
endif 
endf or 



/* Beginning of section I-C, C code implementing the trans_path expression gener 
/*E+ : S YB_MG EN_CONN_B EST*/ 

/*********************************************************************** 

* int SYB_MGEN_CONN_BEST ( identifier, nargs, args, writer ) * 
*• aaa Dick Cramer, Apr. 9, 1995 (written for SELECTOR use) * 

*■ ^Expression generator that returns the atoms attached to a given * 

* r\ atom, excepting the second, in a prioritized order. * 
*■ „|f there are two arguments, the ordering is by decreasing branch * 

* "size", where "size" is first any path with rings encountered, then 
* : ^number of attached atoms, then MW (paths in cycles end when an atom 

* : i[H another path is encountered.) 

* ^ If three arguments, the atom that is returned is the one that 
*' =begins the shortest path containing the atom referred to by the 

* ©hird argument. If multiple such paths, ordering is same as for 

* Ftwo arguments. 

* Q Further prioritization of paths is by molecular weight, 

* Ijj and then by lowest X, Y, Z values . 

* □ If last argument is DEBUG, all paths are written to stdout. 
★ 

* User interface: 

* %trans_path( al a2 ( a3 ) (DEBUG) ) 
**************************** 

int SYB_MGEN_CONN_BEST ( identifier, nargs, args, Writer ) 

/* following arguments contain the text supplied to the %trans_path ( ) 

expression generator, and provide an avenue for producing text output. */ 
char *identif ier; 
int nargs ; 

char *args [] ; 
PFI Writer; 

I 

* define MAX_NP 8 

struct pathrec { 

int root, nrings, chosen, nats; 
float mw, xyz[3]; 
set_ptr path; 

} ; 

struct pathrec p[MAX_NP]; 
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int retval, i, np, toroot, al # a2, a4, a, pnow, pdone, growing, 
final_pos, area_num, new_rings, nats, nuats, elem, ncycles, 
best, debug, ringclosed; 
List_Ptr atom_exp_list=NIL, SYB_EXPR_ANALYZE ( ) ; 
mol_ptr ml, m2 , SYB_AREA_GET_MOLECULE ( ) ; 

atom_pt r arec , SYB_ATOM_FIND_REC ( ) ; 
/* A set_ptr data structure is a Boolean set, first word containing 
its cardinality. */ 

set_ptr atom_setl=NIL, a2chk = NIL, nuls = NIL, cnats = NIL, 

nxcn = NIL, end_atoms = NIL, scratch = NIL, 
SYB_ATOM_FIND_SET ( ) , UTL_SET_CREATE ( ) ; 
char tempString [256] ; 

float get_path_mw( ) , diff; 

void get_path_xyz ( ) ; 

retval = 0; 
: /* Check the number of arguments */ 

if ( nargs < 2 | | nargs > 4 ) { 
UIMS2_WRITE_ERR0R ( 

"Error: %trans_path requires 2 to 4 arguments\n" ); 
return 0; 

np = 0; 

Sj. debug = ( !UTL_STR_CMP_NOCASE ( args [ nargs - 1], "DEBUG" )); 
toroot = (debug && nargs ==4) || ( 'debug && nargs == 3); 

/* csarse the input */ 

/* j£et first atom */ 

I if ( ! (atom_exp_list = SYB_EXPR_ANALYZE ( SYB_EXPR_GET_ATOM_TOKEN, args [0] , 
J- fcfinaljpos, &area_num ))) 
:rj goto error; 

j-if ( ! (ml = SYB_AREA_GET_MOLECULE (area_num) ) ) 
zJ goto cleanup; 

Mif (!(atom_setl - SYB_ATOM_FIND_SET ( ml, atom_exp_list) ) ) 

? A goto error; 
if ( atom_exp_list) 
i SYB_EXPR_DELETE_RPN_LIST( atom_expJList) ; 

; atom_exp_list = (List_Ptr) NIL; 

if ( ! (l UTL_SET_CARDINALITY(atom_setl) ) ) { 
UIMS2_WRITE_ERR0R ( 

"Error: First argument must be only one atom\n"); 
goto error; 

} 

if (!(arec = SYB_ATOM_FIND_REC (ml, UTL_SET_NEXT (atom_setl, -1)) )) goto er 
al = arec->recno; 
UTL_SET_DESTROY( atom_setl ); 
atom_setl = NIL; 
/* get 2nd atom */ 

if ( ! (atom_exp_list = SYB_EXPR_ANALYZE ( SYB_EXPR_GET_ATOM_TOKEN, args [1] , 
&final_pos, &area_num ).)) 
goto error; 

if ( ! (m2 = SYB_AREAJ3ETJM0LECULE (area_num) ) ) 
goto cleanup; 

if (!(end_atoms = S YB_AT0M_F IND_S ET ( m2, atom_exp_list) ) ) 
goto error; 
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if ( atom_exp_list) 




SYB_EXPR_DELETE_RPN_LIST ( atom_exp_list ) ; 

atom_exp_list = (List_Ptr) NIL; 

; if (ml != m2 ) { 

; UIMS 2_WRITE_ERR0R ( 

"Error: atoms must be in the same molecule\n n ) ; 
goto error; 

} 

if(!(l == UTL_SET_CARDINALITY(end_atoms) ) ) { 
UIMS 2_WRITE_ERR0R ( 

"Error: Second argument must be only one atom\n n ); 
goto error; 

} 

if (!(arec = SYB_ATOM_FIND_REC (ml, UTL_SET_NEXT (end_atoms, -1)) )) goto er 
a2 = arec->recno; 

/* get 3rd atom */ 
if (toroot) { 

if ( ! (atom_exp_list .= SYB_EXPR_ANALYZE( SYB - EXPR_GET_ATOM_TOKEN, args[2], 
&final_pos, &area_num ))) 
goto error; 

■ J^if ( ! (m2 = SYB_AREA_GET_MOLECULE (area_num) ) ) 
,1 goto cleanup; 

Jjif ( ! (atom_setl = SYB_ATOM_FIND_SET ( m2, atom_exp_list) ) ) 
J ? ; goto error; 
.'^if( atom_exp_list) 
| f* SYB_EXPRJDELETE_RPN_LIST ( atom_exp_list) ; 

s 

; ^atom_exp_list = (List_Ptr) NIL; 

! TUif (ml != m2 ) { 

P UIMS2_WRITE_ERR0R ( 

If! "Error: atoms must be in the same molecule\n" ) ; 

□ goto error; 

if ( ! (1 == UTL_SET_CARDINALITY(atom_setl) ) ) { 
UIMS2_WRITE_ERR0R ( 

"Error: Second argument must be only one atom\n"); 
goto error; 

} 

if (!(arec = SYB_ATOM_FIND_REC (ml, UTL_SET_NEXT (atom_setl, -1)) )) goto er 
a4 = arec->recno; 

UTL_SET_DESTROY ( atom_setl ); 
atom setl = NIL; 

^ } 

/:* GENERATE the paths */ 

!/* set up paths */ 

if (!(a2chk = UTL_SET_CREATE ( ml- >max_atoms + 1 ) )) goto error; 

if ( ! (mils = UTLJSET_CREATE ( ml- >max_atoms + 1 ) )) goto error; 

if (!(cnats = UTL_SET_CREATE ( ml- >max_atoms + 1 ) )) goto error; 

if ( ! (nxcn = UTL_SET_CREATE ( ml- >max_atoms + 1 ) )) goto error; 
' if (! (scratch = UTL_SET_CREATE ( ml - >max_atoms + 1 ) )) goto error; 

if ( !syb_mgen_conn_att_atoms ( a2chk # ml, al )) goto error; 
if ( !UTL_SET__MEMBER ( a2chk, a2 ) ) { 



UIMS2_WRITE_EI _ 

"Error: second argument atom is not bonded to first argument atom/\n") 
goto error; 

UTL_SET_DELETE ( a2chk / a2 ); 
a - -1; 
np = 0; 

while (np < MAX_NP (a = UTL_SET_NEXT ( a2chk / a)) >= 0 ) { 

if ( ! (p[np] .path = UTL_SET_CREATE ( ml - >max_atoms + 1 ) )) goto error; 

p [np] . root = a; 

p [np] .nrings = 0 ; 

UTL_SET_INSERT( p[np].path, a ); 

np++; 

/* grow the paths */ 
growing = TRUE; 
nats = 0; 
ncycles = 0; 
while (growing ) { 

nuats = 0; 

ringclosed = FALSE; 

for (pnow = 0; pnow < np; pnow++ ) { 

UTLJSET^COPY^INPLACE ( cnats, p [pnow] .path ); 
■ S3 UTL_SET_CLEAR ( nxcn ) ; 
elem = -1; 

/* M:cumnulate this generation of attached atoms into nxcn */ 
'•4 while ( (elem = UTL_SET_NEXT ( cnats, elem)) >= 0 ) { 
Cm UTL_SET_CLEAR ( nuls ); 

Nf if ( ! syb_mgen_conn_att_atoms ( nuls, ml, elem )) return ( FALSE ); 

: M UTL_SET_DELETE ( nuls, al ); 

H= UTL_SET_D1FF_INPLACE ( nuls, end_atoms, nuls ); 

□ UTLJ3ET_0R_INPLACE ( nxcn, nuls, nxcn ); 

fy UTL_SET_DIFF_INPLACE ( nxcn, p [pnow] .path, nxcn ); 

LH UTL_SET_OR_INPLACE ( p [pnow] .path, nxcn, p [pnow] .path ); 
/■* riemove and mark ring closures when growing out */ 

l~L if (Itoroot) for (pdone = 0; pdone < np; pdone++ ) if (pdone != pnow) { 
r UTLJSET_AMD_INPLACE ( p [pnow] .path, p [pdone] .path, a2chk ); 

if ( (new_rings = UTL_SET_CARDINALITY ( a2chk ))) { 
/!* we have ring closure (s) */ 

p [pnow] . nrings += new_rings; 
p [pdone] .nrings += new_rings; 
ringclosed = TRUE; 

UTL_SET_OR_INPLACE ( end_atoms, a2chk, end_atoms ); 
/* if pdone < pnow, two branches are now same lengths, drop common atom from bot 
but if >, branches are different, and must avoid repeated closing */ 
if (pdone < pnow) { 

/* remove atom(s) in the previous branch because paths are really same length 

UTL_SET__DIFF_INPLACE ( p [pdone] .path, a2chk, p [pdone] .path ); 
UTL_SET_DIFF_INPLACE ( p [pnow] .path, a2chk, p [pnow] .path ); 

else { 

/* must identify and mark each atom in nxcn that is attached to a2chk atom */ 

elem = -1; 

while ( (elem = UTL_SET_NEXT ( a2chk, elem)) >= 0 ) { 
UTL_SET_CLEAR ( scratch ); 

if ( ! syb_mgen_conn_att_atoms ( scratch, ml, elem )) 

return ( FALSE ) ; 
UTL_SET_AND_INPLACE ( scratch, nxcn, scratch ); 
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L_SET_OR_INPLACE ( end_atoT^ scratch, end_atoms ) 



} 



} 



if no more atoms added to any path 
nuats = 0; pdone < np; pdone ++ ) 
+ = UTL_SET_CARDINALITY( p[pdone] 
&& Iringclosed) growing = FALSE; 



*/ 

path ) ; 



} 

/* done growing paths 
for (pdone = 0, 
nuats 

if (nuats<=nats 
nats = nuats; 

/* . . or looking for the 4th atom and found it . . */ 

if (toroot) for (pdone = 0; pdone < np; pdone++ ) 

if (UTL_SET_MEMBER ( p [pdone] .path, a4 )) growing = FALSE; 

/* . . or after 100 atom layers out regardless */ 
ncycles++; 

if (ncycles >= 100) growing = FALSE; 

• } 

/* debugging */ 

if (debug) for (pdone = 0; pdone < np; pdone + + ) { 

sprintf ( tempString, "Path %d (%d rings, from %d) : ", 

pdone+1, p [pdone] .nrings, p [pdone] .root ) ; 
UBS JDUTPUT_MES SAGE ( stdout, tempString ); 
Q ashow( p [pdone] .path, ml ); 
i * 

/* "'Compute the path properties */ 
j "fibr (pdone = 0; pdone < np; pdone++) { 

: /£p mark as already chosen any path that can't be an answer */ 

' N p [pdone] . chosen = toroot && !UTL_SETJMEMBER (p [pdone] .path, a4) 

: U p [pdone] .nats = UTL_SET_CARDINALITY ( p [pdone] .path ); 

H* p [pdone] .nrings = p [pdone] .nrings ? 1 : 0; 

s p[pdone].mw = 0.0; 

£□ p [pdone] .xyz [0] = p [pdone] .xyz [1] = p [pdone] .xyz [2] = 



0.0; 



/* agpturn the best result */ 
fest = 0; 

ibr (pdone = 1; pdone < np; pdone++) { 
if (toroot) { 

r if (p [best] . chosen && !p [pdone] . chosen) best = pdone; 

/* looking backward along chain, always grow away from more negative coord value 
if ( !p [best] .chosen && !p [pdone] . chosen) { 

get_path_xyz ( p [pdone] . root , ml, p [pdone] .xyz ); 
get _path_xyz ( p [best] . root , ml, p [best] .xyz ); 
for ( i = 0; i < 3; i++ ) { 

diff = p [pdone] .xyz [i] - p [best] .xyz [i] ; 
if (diff < -0.1) { 
best = pdone; 
break; 

! } 

if (diff > 0.1 ) break; 
other coords if basically tied at this coord */ 



/|* checking 



, } 

else { 

if (p [pdone] .nrings && !p [best] .nrings) best = pdone; 

else if (p [pdone] .nats > p [best] .nats) best = pdone; 

else if (p [pdone] .nats == p [best] . nats) { 
p [pdone ].mw = get_path_mw( p [pdone] .path, ml, p [pdone] .mw 
p[best].mw = get_path_mw( p [best] .path, ml, p[best].mw ); 
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IL 



if (ptpdone]^^ > p[best].mw) best = pdo^^ 

} } 

arec = SYB_ATOM_FIND_REC ( ml, p [best]. root ); 

sprint f (tempString, "%d" , arec->id ); 
if (! (*Writer) (tempString)) goto error; 

retval = TRUE; 
error: 
cleanup: 

if( atom_exp_list) 

SYB_EXPR_DELETE_RPN_LIST( atom_exp_list ) ; 
if (atom_setl) 

UTL_SETJDESTROY (atom_setl) ; 
if (end_atoms) 

UTL_SET_DESTROY ( end_at oms ) ; 
if(a2chk) 

UTL_SET_DESTROY (a2chk) ; 
if(nuls) 

UTL_SET_DESTROY (nuls ) ; 
^if (nxcn) 

UTL_SET_DESTROY (nxcn) ; 
^if (cnats) 

UTL_SETJ)ESTROY (cnats) ; 
Njif (scratch) 

CH UTLJ3ETJDESTR0Y (scratch) ; 

Mreturn{ retval ); 

staGgic int syb_mgen_conn_att_atoms ( aset, m, atid ) 

/* Mrs atoms attached to atm into aset */ 

/* gfORKS STRUCTLY WITH RECNOS */ 

sedgptr aset; 

moligptr m; 

intTTatid; 

{; f ~ 

atomjptr at, SYB_ATOM_FIND_ID () ; 
List_Ptr tohs, UTL_LIST_RETRIEVE_P ( ) ; 
atom_ptr toh, SYB_ATOM_FIND_REC ( ) ; 
acon_ptr connl; 
int nbytesl; 

at = SYB_ATOM_FIND_REC ( m, atid ); 
tohs 5= at- >conn_atom; 
while (tohs) { 

tohs = UTL_LIST_RETRIEVE_P ( tohs, &connl, &nbytesl) ; 

toh = SYB_ATOM_FIND_REC ( m, connl - >target ); 
^ UTL_SET_INSERT( aset, toh->recno ); 

return ( TRUE ) ; 

} 

static float getjpath_mw( aset, m, mw ) 

/* returns the total atomic weight of all atoms in aset */ 
setjptr aset; 
moljptr m; 
51 oat mw; 
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{ 

int elem = -1; 
float ans = 0.0; 

atom_ptr at, SYB_ATOM_FIND_REC ( ) ; 
f pt S YB_ATAB_ATOMI C_WE IGHT ( ) ; 

if (mw) return ( mw ) ; 
elem = - 1 ; 

while ( (elem = UTL_SET_NEXT ( aset, elem)) >= 0 ) { 
at = SYB_ATOM_FIND_REC ( m, elem ) ; 

ans += (float) SYB_ATAB_ATOMIC_WEIGHT ( at->type ); 

} 

return ( ans ) ; 

} 

static void get_path_xyz ( aid, m, mw ) 

/* returns the xyz of the supplied atom */ 

int aid; 

mol_ptr m; 

float mw[3] ; 
r 

v 

inc i; 

^|om_ptr at, SYB_ATOM_FIND_REC ( ) ; 

. £S (mw[0]) return; 

a| = SYB_ATOM_FIND_REC ( m, aid ); 
: £§fr (i = 0; i < 3; i++) mw[i] = at->xyz[i]; 
: Return; 

staTic int ashow( aset, m ) 

/* ]£or interactive debugging, shows a set's membership in terms of atom ID */ 
setyptr aset; 
MoliHptir m; 

{ H 

Ui char buff [1000], *b; 

13 atom_ptr at, SYB_ATOM_FIND_REC ( ) ; 

H= int elem; 

♦buff ='/(>'; 
b - buff; 
elem = - 1 ; 

while ( (elem = UTL_SET_NEXT ( aset, elem)) >= 0 ) { 
at = SYB_ATOM_FIND_REC ( m, elem ); 
sprintf( b, " %d", at->id ); 
b = buff + strlen( buff ) ; 

} 

sprint f ( b, "\n" ) ; 

UBS OUTPUT MESSAGE ( stdout, buff ); 

}; 

/* BEGINNING OF SUBROUTINES I-D. Calculation of attenuated fields */ 

/*+E : QSAR_FIELD_EVAL_RB_ATTEN ( ) */ 
/********************************************* 

/* ' */ 

/* int QSAR_FIELD_EVAL_RB_ATTEN ( molp, stfldp, elfldp, regp, no_st, no_el, ctp ) 

/* */ 
/* Dick Cramer May 13, 1995 */ 

/* */ 
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/* 



"Standard CoMFA" except that the contribution of any atom 
to the field falls off with an inverse power of its distance 
from a root atom, measured in NUMBER OF ROTATABLE BONDS! 

This means also that each individual atom's contribution 
has a similarly scaled upper bound, rather than checking 
the upper bound only for the sum over all atoms. 



/* This procedure computes vdW 6-12 steric values at each point in region */ 
/* and the electrostatic interactions (initially assuming 1/r dielectric) . */ 



/ * : QS AR_FIELD_EVAL_RB_ATTEN ( ) * / 

intH : QSAR_FIELD_EVAL - RB - ATTEN ( molp, stfldp, elfldp, regp , no_st, no_el, ctp) 

mojfptr molp; 

FieldPtr stfldp, elfldp; 

RecponPtr regp; 

intfUno_ st, no_el ; 

ConfiaTopPtr ctp ; 

Box|tr box; 

atqm_ptr at, SYB_ATOM_FIND_ID ( ) ; 

int pid, b, ix, iy, iz, nat, vol_avg, repulsive ; 

fpt *steric, *elect, SYB_ATAB_VDW_RADII ( ) ; 

fpt diff, dis, dis2, x, y, z, sum_steric / sum_elect ; 

fpt dis6, disl2 , repuls_val, offs[9][3], atm__ste, atm_ele; 

fpt *charge, *ctemp, *coord, *ftemp, *wt, scale_vol_avg, atm_steric, atm_elect; 
int *atyp , *itemp, dohbd, dohba, ishbd, retval, dielectric , off, at id; 
static fpt hbond_scal; 

fpt hbond_A, hbond_B, *AtWts = NIL, *QSAR_FIELD_RB_WTS () ; 

int *HAs, *HDs, *HAp, *HDp; /* sets would be more efficient but slower */ 
int do_steric, do_elect; 

set_ptr hdonor, SYB_HBOND_DONORS ( ) , pset = NIL, aset = NIL; 

ftdefine Q2KC 332.0 

#define MIN_SQ_D I STANCE 1.0e-4 

/* A ^ any atom within 10-2 Angstroms is hereby zapped ! 

this is about it: 10 A 6 / 10 A -24 is close to overflow! */ 

ftemp = NIL; ctemp = NIL; itemp = NIL; retval = FALSE; HAs = NIL; HDs = NIL; 
hdonor = NIL; 

/* for now, make root atom the one closest to 0,0,0 */ 
; for (nat = 1; nat <= molp->natoms ; nat++) { 



*/ 
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at = SYB ATOM FI ( molp, nat ) ; 

dis2 = at->xyz[0] * at->xyz[0] + at->xyz[l] * at->xyz[l] + 
ah-:>xvzr2l * at->xvzF21: 



} 



- -*A.y ^ [uj - c*. l. ***n~y 6 tu j -r a. 

at->xyz[2] * at->xyz[2]; 
if (nat == 1 | | dis2 < dis) { 
dis = dis2; 
atid = nat; 

} 



/* following is specific to topomeric fields */ 
if (!(AtWts = QSAR_FIELD_RB_WTS ( molp, atid ) )) goto cleanup; 

if (!no_el) 
{dielectric = elf ldp- >dielectric ; 
vol_avg = elf ldp- >vol_avg_type; 
scale_vol_avg = elf ldp- >scale_vol_avg; 
repulsive = elf ldp- >repulsive; 

repuls_val=repexp [repulsive] ; elect = elf ldp -> f ield_value; } 
if ( !no_ st) 
{vol_avg = stf ldp- >vol_avg_type ; 
scale_vol_avg = stf ldp- >scale_vol_avg; 
repulsive = stf ldp- >repulsive; 

repuls_val=repexp [repulsive] ; steric = stf ldp -> f ield_value; } 

F — 5 

:iM(!(ftemp = (fpt *) UTL_MEM_ALLOC (3*sizeof (fpt) *molp- >natoms) ) ) goto cleanup; 
: ifM( ! (ctemp = (fpt *) UTL_MEM_ALLOC ( sizeof (fpt) *molp- >natoms) ) ) goto cleanup; 
\±t"4( I (itemp = (int *) UTL_MEM_ALLOC ( sizeof (int) *molp- >natoms) ) ) goto cleanup; 
lifip(!(HAs = (int *) UTL _MEM_ALLOC ( sizeof (int ) *molp- >natoms) ) ) goto cleanup; 
! ifM( ! (HDs = (int *) UTL JMEM_ALLOC ( sizeof (int) *molp- >natoms) ) ) goto cleanup; 
/;* get just those H's which are capable of Hbonding */ 
ifM( ! (hdonor = SYB_HBOND_DONORS ( molp, NIL ) )) goto cleanup; 

2 

: f cag (coord=f temp, atyp=itemp, charge=ctemp, HAp=HAs , HDp=HDs , nat=l; 
ry nat<=molp->natoms;nat++) 

{gLf (NIL ==(at = SYB_ATOM_FIND_ID (molp, nat) ) ) goto cleanup; 
nfcoord++ = at->xyz[0] 
^*coord++ = at->xyz[l] 
rj*coord++ = at->xyz[2] 
r *atyp++ = at->type -1 ; 
*charge++ = at->charge; 

*HAp++ = SYB_ATAB_HBOND_ACCEPT(at->type) ; 
*HDp++ = UTL_SET_MEMBER (hdonor , at->recno) ; . 

for (b=0; b<regp- >n_boxes ; b++) { 
: box = & regp- >box_array [b] ; 
dohbd = (SYB_ATAB_ATOMIC_NOMBER( box- >atom_type) ==1) && 

(box->pt_charge ==1.0); 
dohba = (SYB_ATAB_ATOMIC_NUMBER( box- >atom__type ) ==8) && 
j (box->pt_charge == -1.0); 

if (dohbd | | dohba) { 

if ( ! TAILOR_STORE_IT_HERE ( "TAILOR ! FORCE_FIELD ! HBOND_JRAD_S CAL ING " , 

&hbond_scal, 1)) goto cleanup; 
hbond_A = pow( hbond_scal, 6.0 ); 
hbond B = hbond A * hbond A; 

} 

if (vol_avg) 

QSAR_FIELD_EVAL_GETOFF(of fs,box->stepsize / vol_avg, scale_vol_avg) ; 
if ( !no_st ) 

QSAR_FIELD_VDWTAB ( box -> atom_type, repuls_val, ctp- >du_lp_steric ); 
for (iz=0, z=box->lo[2] ; iz < box- >nstep [2] ; iz++, z += box->stepsize [2] ) 
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for (17=0, y=box->l ; iy < box->nstep [1] ; iy^^ y += box- >stepsize [1] ) 

for (ix=0, x=box->lo[0] ; ix < box- >nstep [0] .; ix++, x += box- >stepsize [0] ) 

for ( coord = ftemp, charge = ctemp, atyp = itemp, HAp=HAs, HDp=HDs , 

do_steric=TRUE, do_elect=TRUE, nat=0, sum_steric = sumjslect =0.0 
nat<molp- >natoms ; 
nat++, wt++) 

{ 

if ( ( *atyp == DUMMY- 1 | | *atyp ==• LP-1 ) && ! ctp- >du_lp_elect ) 

*charge =0.0; /* set charge to 0 since ignoring Du/lp */ 
if (!vol_avg) /* the "normal" case */ 

dis2 = x - *coord++ ; 

dis2 * = dis2; 

diff y - *coord++ ; 

diff *= diff; 

dis2 += diff; 

diff = z - *coord++ ; 

diff *= diff; 

dis2 += diff; 

if ( !no_el && elf ldp- >zap_el==2 && do_elect) { 
dis = sqrt ( dis2 ) ; 
= if ( dis < SYB_ATAB_VDW_RADII ( *atyp+l ) ) { 

jo shortcircuits ! */ 

*elect++ = 0.0; 
J do_elect = FALSE; 

4 } 
} 

if ( dis2 < MIN_S Q_D I S TANCE ) { 
if ( !no_St ) 

=j /* if atom has no steric value, we don't care about 

U MIN_SQ_DISTANCE since it has no contribution anyway */ 

J if ( vdw_a[*atyp] != 0.0 && vdw_b[*atyp] != 0.0 ) { 

[j /* set sterics to its max value at current grid pt. */ 

□ atm_steric = (*wt) * stf ldp- >max_value; 

if ( !no_el && do_elect) { 

if ( !no_st && !do_steric && elf ldp- >zap_el ) { 
*elect++ = DAB F MISSING; 
} 

else if ( *charge != 0.0 ) { 
if ( *charge > 0.0 ) 

atm_elect = (*wt) * elf ldp->max_value; 
else atm_elect = (*wt) * -elf ldp- >max_value; 

if ( !do_elect && !do_steric ) 

break; /* break out of loop since neither el. or st. 

need to be calculated for this grid point */ 

/* settingdis2 to 1 (an arbitrary no.) "will prevent a zero 

divide in the sum_steric or sum_elect calculations below */ 
dis2 = 1.0; 

} 

if ( ! no_st && do_steric ) { 
dis6 = dis2 * dis2 * dis2; 
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disl2= dis6 *^^6 ; 
if (repulsive) 

disl2 = (repulsive==l) ? disl2 / dis2 : disl2 / dis2 / dis2; 
if (dohbd *HAp) 

atm_steric = hbond_B * vdw_b [*atyp] /disl2 - 
hbond_A * vdw_a[*atyp] /dis6 ; 
else if (dohba && *HDp) 

atm_steric = hbond_B * vdwjb [*atyp] /disl2 - 
hbond_A * vdw_a [*atyp] /dis6 ; 

else 

atm_steric = vdw_b [*atyp] /disl2 - vdw_a [*atyp] /dis6 ; 
HAp++; HDp++; } 

atm_steric = atm_steric >' stf ldp- >max_value ? stf ldp- >max_value 

: atm_steric; 
atm_steric *= (*wt) ; 
if ( ! no_el ScSc do_elect ) { 
atm_elect = *charge++ / 

( dielectric ? sqrt(dis2) : dis2 ) ; 
atm_elect = atm_elect > elf ldp- >max_yalue ? elf ldp- >max_value 
: atm_elect; 

atm_elect - atm_elect < - (elf ldp- >max_yalue) ? - (elf ldp- >max value) 

: atm_elect; ~ 
atm_elect * = (*wt) ; 
sum_elect += atm_elect; 

atyp++; 

sum_steric += atm__steric; 
else 

^ for (off=0;off<9;off++) 



} 



[U coord += 3 

E3 atyp ++ 

Lfi charge ++ 

□ HAp ++ 

M= HDp ++ 

} /* atom loop */ 
doneatoms : 



if ( do_ steric 
if (vol_avg) 



| do_elect ) { 
_ _ . sum_elect /= 9.0; sum_steric /= 9.0 ; } 
if ( !no_el && do_elect ) 
{ *elect = sum_elect * box-> pt_charge * Q2KC ; 

if ( *elect > elfldp->max_value ) *elect = elf ldp- >max_value; 
else if ( *elect < - elf ldp- >max_value ) *elect = 
- elf ldp- >max_value; 
transf orm_f ield (elf ldp- >max__value, elect , ctp) ; 
elect ++; 

} 

if ( !no_st && do_steric ) 

{ *steric = sum_steric ; 

if ( *steric > stf ldp- >max_value) 

{ *steric = stfldp->max_value; 

if (!no_el && elf ldp->zap_el==l ) *(elect-l) = DAB F MISSING; } 

transform_field(stfldp->max_value / steric, ctp) ; 
steric ++ ; } 

, } 

} /* points in box loop */ 
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} /* boxes loop */ 

retval - TRUE; 
cleanup: 

if ( itemp) UTL_MEM_FREE ( itemp) 
if ( ftemp) UTL_MEM_FREE ( ftemp) 
if ( ctemp) UTL_MEM_FREE ( ctemp) 
if (HAS) UTL_MEM_FREE ( HAS ); 
if (HDs) UTL_MEM_FREE ( HDs ); 
if (hdonor) UTL_SET_DESTROY ( hdonor ); 
if (AtWts) UTL_MEM_FREE ( AtWts )• ; 
if (pset) UTL_MEM_FREE ( pset ); 
if (aset) UTL_MEM_FREE ( aset ); 
return retval; 

ftundef Q2KC 

#undef MIN SQ DISTANCE 

}: 

/* 

static fpt *QSAR_FIELD_RB_WTS ( molp, rootid ) 
/* generates rotational -bond wts for each atom */ 
mol_ptr molp; 
ink* rootid; 

/♦j^seudo code for FIELD RB WTS() 
3rtiile saw new atoms 

f ; uncover atoms that stopped last shell growth 

^ grow next "rotational shell" 

ff while adding to shell 

^~ for each atom in shell 

3_ get neighbors not seen 

O for each neighbor 

fu if bond is rotatable (acyclic, >1 attached atom, not =,am,#) 

U cover all other atoms attached to atom for this shell 

ifl add it to shell 

*/Q 



fpt *ansr 
int 

setjptr 

atomjptr 

bond_ptr 

List_Ptr 

aconjptr 

char 

void 



NIL, *vals = NIL, factor, nowfact = 1.0; 
found, aggcount, atid, aggid, loop, size; 

aggats = NIL, allats = NIL, nuls = NIL, endatms = NIL, end_cands 

root, SYB_ATOM_FIND_REC() , at, atrec ; 
b , SYB_BOND_FIND_REC ( ) ; 
toats , UTL_LIST_RETRIEVE_P ( ) ; 
cptr; 

tempSt ring [200] ; 

ashow ( ) , qsar_f ield_attached_atoms ( ) ; 



if 
if 



if 
if 
if 
if 
if 
if 



!( vals = (fpt *) UTL JYEM_ALLOC ( sizeof (f pt) *molp- >natoms) ) ) return( NI 
! UIMS2_VAR_GET_TOKEN ( 11 TAILOR ! COMFA ! AGGREGJDESCALE " , 



&f actor ) ) return ( NIL ); 

(allats = UTL_SET_CREATE ( molp- >max_atoms + 1 ) )) goto cleanup; 
(aggats = UTL_SET_CREATE ( molp- >max_atoms + 1 ) )) goto cleanup; 
(nuls = UTL_SET_CREATE ( molp- >max_atoms + 1 ) )) goto cleanup; 
(endatms = UTL_SET_CREATE ( molp- >max_atoms + 1 ) )) goto cleanup; 
(end_cands = UTL_SET_CREATE ( molp->max_atoms + 1 ) )) goto cleanup; 
( root = SYB_ATOM_FIND_REC ( molp, rootid ) )) goto cleanup; 
UTL_SET__INSERT ( aggats, root->recno ); 
UTL_SET_INSERT( allats, root-> recno ); 
aggcount = loop = 1; 
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71 



) { 

goto cleanup; 



while (TRUE) { 

while (TRUE) { 
aggid = -1; 

while ((aggid = UTL_SET_NEXT ( allats, aggid )) >= 0 ) { 
UTL_SET_CLEAR ( nuls ); 

qsar_f ield_attached_atoms ( nuls, molp, aggid ); 
UTL_SET_DIFF_INPLACE ( nuls, allats, nuls ); 
UTL_SET_DIFF_INPLACE ( nuls, endatms, nuls ); 
identifying any atoms that terminate this aggregate */ 
atid = -l; 

while ((atid = UTL_SET_NEXT ( nuls, atid ) ) >= 0 
if (!( at = SYB_ATOM_FIND_REC ( molp, atid ) )) 
skipping monovalent atoms */ 

if (at->nbond > 1) { 
find bond record that attaches to aggid */ 

toats = at- >conn_atom; 
found = FALSE; 
while (toats && ! found ) { 

toats = UTL_LIST_RETRIEVE_P ( toats, fccptr, fcsize ); 
found = (cptr-> target == aggid ); 

} 

if (! found) goto cleanup; 

b = SYB_BOND_FIND_REC (molp, cptr- >bond_rec) ; 
l t if ( !(b->status & BOND_V_IRING) && !(b->status & BOND_V_ERI 

™ && (b->type == S YB_BTAB _MNEM_TO_TYPE ( " 1 " ) ) ) { 

jkave an end -of -aggregate atom, mark as end atoms all other attached atoms */ 
-J UTL_SET_CLEAR( end_cands ); 

? : qsar__f ield_attached_atoms ( end_cands, molp, at->recno ); 

: jj UTL_SET — DELETE ( end_cands, aggid ); 

~ UTL SET OR INPLACE( endatms, end cands, endatms ); 



} 



} 



} 



} 



UTL_SET_OR_INPLACE ( aggats, nuls, aggats ); 



} 



if ( UTL_S E T_C ARD INAL I T Y ( aggats ) <= aggcount ) break; 
aggcount = UTL_SET_CARDINALITY ( aggats ); 
UTL_SET_OR_INPLACE ( allats, aggats, allats ); 



debugging stuff 



7 



sprintf( tempString, "Aggregate %d (weight = 
UBS_OUTPUT_MESSAGE ( stdout, tempString ); 
ashow( aggats, molp ) ; 



rf ) 



loop, nowfact ) ; 



if no atoms added, we are done! */ 

if (UTL_SET_EMPTY( aggats )) break; 
record scaling factor for atoms in this aggregate */ 

atid = -1; 

while ((atid = UTL_SET_NEXT ( aggats, atid )) >= 0 ) { 

if (! (atrec = SYB_ATOM_FIND_REC ( moip, atid ))) goto cleanup; 
vals [ (atrec- >id) -1 ] = nowfact; 



} 

UTL_SET_OR_INPLACE ( allats , 
UTL_SET_CLEAR ( aggats ); 
UTL_SET_CLEAR ( endatms ) ; 
aggcount = 0; 
nowfact *= factor; 
loop++; 



aggats, allats ); 
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cleanup: 

if (aggats) UTL_SET_DESTROY ( aggats ); 

if (allats) UTL_SET_DESTROY( allats ); 

if (endatms) UTLJ3ET_DESTR0Y ( endatms ); 

if (end_cands) UTL_SET__DESTROY ( end_cands ); 

if (nuls) UTL_SETJDESTROY ( nuls ); 

return { ansr ) ; 



ansr = vals; 




static void qsar_f ield_attached_atoms { aset, m, atid ) 

/* ors atoms attached to atm into aset */ 

/* WORKS STRUCTLY WITH RECNOS */ 

set_ptr aset; 

mol_ptr m; 

int atid; 

{ 

atomj)tr at, SYB_ATOM_FIND_ID ( ) ; 
List__Ptr tohs, UTL_LIST_RETRIEVE_P ( ) ; 
atom_ptr toh, SYB_ATOM_FIND_REC ( ) ; 
acon_jptr connl; 
_int nbytesl; 

%5at = SYB_ATOM_FIND_REC ( m, atid ); 
Nltohs = at- >conn_atom; 
Awhile (tohs) { 

m tohs = UTL_LIST_RETRIEVE_P ( tohs, fcconnl, fcnbytesl) ; 

Sj toh = SYB_ATOM_FIND_REC< m, connl- >target ); 

j»* UTL_SET_ INSERT ( aset, toh->recno ); 

- return; 
static void ashow( aset, m ) 

/ *?3 for int eractive debugging, shows a set's membership in terms of atom ID */ 
se jL-P tr aset; 
mdlTjp tr m; 

char buff [1000] , *b; 

a t om_p tr at, S YB_ATOM_FIND_REC ( ) ; 

int elem; 

*buf f = ' /0' ; 
b = buff; 
elem = -1; 

while ( (elem = UTL_SET_NEXT ( aset, elem)) >= 0 ) { 

at = SYB_ATOM_FIND_REC( m, elem ); 

sprint f( b, " %d", at->id ); 
^ b = buff + strlen( buff ); 

sprint f ( b, "\n n ) ; 

UBS_OUTPUT_MESSAGE ( Stdout, buff ); 
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# Section II -A. SPL invoked shell for computing the diagonal defining the 

# "best" triangle, e.g., the one with the highest density of points below. 
# 

@expression_generator LRT_FAST 

# Usage: 

# lrt_fast rows descriptor_cols bio_col [pis flags like scaling in quotes] 

# rows (*) - rows to take 

# descriptor_cols - which columns are the neighborhood metrics 

# bio_col - which column has the bio (probably log bio) data 

# [...] - if need to SCAL NONE or anything like that, do it here 
# 

# returns a line of the form 

# 3.09691 / 0.000546509 = 5666.71 - 496 : 496 : : 15.6981 : 15.6989 

# * max bio difference 

# * optimal distance division for max bio 



# A slope 

# ^ number in the Irt 
#. ^total number 

# ^area in the Irt 

# ^total area 



# Significance is related to whether ratio of numbers is 

# 4; much above ratio of areas. 

# H 

gtbbalvar SAMPLS_IN_PROGRESS DONE_CHECKED_OUT 
lbbalvar hold distname rows cols bio 

sbtvar rows %promptif ( "$1 " R0W_EXP "*" "Rows to use in Irt") 

sfetvar cols %promptif ( "$2 " C0L_EXP " COMFA* " "Columns of mol descriptors") 

sbtvar bio %promptif ( "$3 " C0L_EXP "LOGBIO" "Column of bio data") 

sgfcvar hold SAMPLS_IN_PROGRESS 
sgtvar SAMPLS_IN_PROGRESS $bio 

s^var distname TAILOR ! HIER ! DIST_FNAME 
sgjvar TAILOR! HIER !DIST_FNAME lrt_fort.3 

# here the information is computed and written to a file 

# whose name is passed in via a TAILOR value 
QSAR ANA DO I >$NULLDEV $rows $cols HIER $4 | 

setvar SAMPLS_IN_PROGRESS $hold 

isetvar TAILOR! HIER !DIST_FNAME $distname 

# contents of the file are returned to the caller 
setvar hold %system("cat lrt_fort.3") 

%return( "$hold" ) 



# 

# Section Il-B. SPL script for computing the significance of the distribution 

# found by lrt_fast 
# 

@expression_generator dochi 

# computes the chi- square statistic for the number of points below 

# the diagonal, null hyptheses being the area fraction of the total. 
# 

# To be called as: %dochi ( %lrt_fast( ).), i.e., its inputs 
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# are exactly the outp %lrt fast as described ^^the lrt fast header. 

# 

setvar expected %math( $9 * $11 / $13 ) 
setvar sq %math( $7 - $expected ) 
setvar sq %math( $sq * $sq / $expected ) 
%return( $sq ) 



/;* Section II-C. Computes the best diagonal in the "virtual graph" of biological 
distances vs property differences. */ 

int QSHELL_HIER_LRT (table , biocol , dmat , nrow, order , lmsg) 
char * table; 

int biocol, /* column in MSS with biological data */ 

nrow, /* dimension of dmat and order */ 

*order; /* array of row IDs to consider */ 
fpt *dmat; /* distance matrix for property distances */ 
char *lmsg; /* file name for results */ 
{ 

fpt *p, *q, fabs(), bmax; 
int i,j, count, status_array ; 
chax *fpt_colname; 
Fim *out, *UTL_FILE_FOPEN ( ) ; 

ft need to get the bio values 

^ l n t he n ^ 2 we can repack into n(n-l)/2 then add the n bio values 
yi and finish with the bio distances */ 

J* 

?aB No error handling. Better be data in those rows! 

:f!3: (count=0 / i=0; i<nrow; i++) 
: B£>r (j=0; j<i; j+ + ) 
Q dmat [count ++] = dmat[i*nrow + j]; 

U i 

qfj p = dmat + ( (nrow-1) * nrow) / 2; 

TBL_ACCESS_INDEX_TO_COLNAME ( table , biocol - 1 , &f pt_colname ) ; 
TBL_GRAB_INIT_FPTS (table, 1, &fpt_colname ) ; 
for ( i=0;i<nrow;i++, p++) 

TBL_GRAB_GET_FPTS_INV ( order [ i ] - 1 , &s ta tus_ar ray , p ) ; 
TBL_GRAB_COMPLETE_FPTS ( ) ; 

bmax = 0.0; 

for (count=0, i=0; i<nrow; i++) 
for (j=0; j<i; count ++) 

if ( (p [count] = fabs(q[i] - q[j])) > bmax) bmax = p [count]; 

out = UTL_FILE_FOPEN(lmsg, "w") ; 
QSHELL_HIER_DO_LRT ( ou t , count , dmat , p , bmax) ; 

iUTL FILE FCLOSE(out); 
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int QSHELL_HIER_DO_LRT^^t, index, xsort, ysort, b^^ ) 
FILE *out; 

fpt *xsort / *ysort, bmax; 
int index ; 

int *order, count, j , i, bad; 
int bestN # bestl ; 
fpt den, bestDen; 

#define CUTOFF ( bmax* ( xsort [order [i] ] / xsort [order [j } ] ) ) 

if (! (order = (int *)■ UTL_MEM_ALLOC ( index *sizeof(int )))) return 0; 
for (i=0; i<index; i++) order [i]=i; 
bestN = bestl = bad = 0; 
bestDen = 0.0; 

fpt_heapsort (index, xsort, order) ; 
ifor ( j=0;count=0, bad=0, j<index. ; j ++) 

s{ 

if (xsort [order [j ] ] <= 0.0) continue; 
for (i=0; i<=j ; i++) 

; ,-.{ 

™ if (ysort [order [i] ] <= CUTOFF) count++; 

else bad++; 
"'"i } /* loop over all d <= this distance */. 
]!fif ( (den = count/ bmax / xsort [order [j ] ] *2.0) > bestDen) 
{bestDen = den; bestl = j; bestN = index - bad;} 
/* loop over all distances .■*/.. 

dfeh = bmax * xsort [order [index- 1] ] ; 

: sprintf (msg, "%g / %g = %g - %d : %d ::.%g : %g\n'V- 
C3 bmax, xsort [order [bestl] ] , bmax/xsort [order [bestl] ] , 

fU bestN, index, den-xsort [order [bestl] ] *bmax/2 .0, den) ; 

UBS_OUTPUT_ME S SAGE (out, msg) ; 

UfL_MEM_FREE (order) ; 

return 1; 

} P 
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/* n is number of elements 

arrin is array of floats to be sorted 
indx is array of ints initially 0...n-l 

*/ 

int fptjieapsort (n, arrin , indx) 
int n; 
fpt *arr inf- 
ant *indx; 

c 

int 1, ir, indxt, i, j; 
: fpt q; 

;l = n/2 ; 
ir = n -1 ; 

while (TRUE) /* the "10" loop */ 

if (1>0) { indxt = indx[--l]; q = arrin [ indxt ] ; } 
else 

{ 

indxt = indx[ir] ; q = arrin [indxt] ; 
indx[ir--] = indx[0] ; 
q if ( ir -o 0 ) 

ip { indx[0] = indxt; return 1; } /* <=== Only way out ! */ 

S! ) 
: K± = 1; 

frf- = 1; 
, si] = 1 + 1 +1; 

: ijvhile (j <= ir) /* the "20" loop */ 

i a 

j a if ( (j<ir) && (arrin [indxfj ] ] < arrin [indx [j +1] ] ) ) j++ ; 

1 q if (q < arrin [indx [j] ] ) { indx[i] = indx[j] ; i = j; j = j+j+1; } 

: ?n else { j = ir+1; } 

: "indx[i] = indxt; 
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L^^^ions for all non-standarc^^t 



/* SECTION III -A. Decla^ptions for all non-standard^Plta structures referenced 
in the C code functions shown in Sections I and II. */ 
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/********************************************************* 



Molecule and Supporting Structure Definitions 

John McAlister 09 -Aug- 1985 

This file contains the definitions for the molecular data struc- 
tures required within SYBYL. The contents of this file are des- 
described in detail in the document "SYBYL Molecular Data Struc- 
tures" . 



/* 
/* 
/* 
/* 
/* 
/* 
/* 
/* 
/* 

/*************************************************************************/ 



*/ 

*/ 
*/ 
*/ 
*/ 
*/ 
*/ 
*/ 
*/ 



/* Define the molecule descriptor template 



'/ 



typedef 


struct molecule_ 


struct { 




char 




*name; 


/* 


pointer to molecule name 


*/ 


i32 




type; 


/* 


molecule type 


*/ 


List 


Ptr 


diet ; 


/* 


list of dictionaries used with molecule 


*/ 


i32 




status; 


/* 


molecule status 


*/ 


char 




* comment ; 


/* 


pointer to comment for molecule 


*/ 


stamp 


cre_t ime; 


/* 


creation time/user/version stamp 


*/ 


stamp 


mod_time; 


/* 


modification time/user/version stamp 


*/ 


int 




max_props ; 


/* 


maximum properties currently allocated 


*/ 


4} int 




nprops ; 


/* 


number of molecular properties 


*/ 


>4 List 


Ptr 


props; 


/* 


pointer to list of properties 


*/ 


M int 




max_f eats; 


/* 


maximum features currently allocated 


*/ 






nf eats; 


/* 


number of molecular features 


*/ 


EH List_ 


Ptr 


feats; 


7* 


pointer to list of molecular features 


*/ 


~4 int 




max_subst ; 


/* 


maximum substructures currently allocated*/ 


M int 




nsubst ; 


/* 


number of substructures in molecule 


*/ 


H List_ 


Ptr 


subst; 


/* 


pointer to list of substructures 


*/ 



List_Ptr 
int 
int 

List_Ptr 

int 

int 

List_Ptr 
int 
fpt 
fpt 

List_Ptr 

} molecule j 



/* 
/* 
/* 



subst_roots; /* pointer to list of root subst offsets */ 

max_atoms; /* maximum atoms currently allocated */ 

/* number of atoms in molecule */ 

pointer to. atom array segment list */ 

maximum bonds currently allocated */ 

number of bonds in molecule */ 

/* pointer to bond array segment list */ 

/* type of atomic charges, if present */ 

/* translation vector for molecule */ 

/* rotation matrix for molecule */ 

/* pointer to list of associated data */ 

/* descriptors */ 

*mol_ptr; 



natoms; 
atoms; 
max_bonds ; 
nbonds ; 
bonds ; 
charges; 
vector [3] ; 
matrix [9] ; 
assoc data; 



************* ******* ATOM 

/* Define the atom entry record 
typedef struct atom struct 



DE FINITION *******************************/ 

*/ 
*/ 



char 


*name; 


/* 


int 


type; 


/* 


i32 


status; 


/* 


int 


recno; 


/* 


int 


id; 


/* 


int 


link; 


/* 


int 


subst ; 


/* 


List_Ptr 


property; 


/* 


List_Ptr 


feature; 


/* 






/* 


int 


nbond; 


/* 



template 

atom name 
atom type 
atom status 

cumulative atom record number 

atom id (logical atom number) 

link to next atom record 

offset to substructure containing atom 

pointer to list of properties for atom 

pointer to list of features including 

this atom 
number of bonds involving this atom 



*/ 
*/ 
V 

*/ 
*/ 
*/ 
*/ 

V 

*/ 
*/ 
*/ 
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List_Ptr 

fpt 

fpt 

} atom, 



conn_ 
xyz [3] ; 
charge ; 
*atoinj?tr; 



/* pointer to list of m 
/* coordinates of atom 
/* point charge on atom 



ded atoms 



Define the atom array segment descriptor template 
typedef struct atom_seg_struct { 



atomjptr 
moljatr 
int 
int 
int 
int 

} atom_seg, 



seg_head; 
molecule; 
max_atom; 
natom; 
used_atom; 
f ree_ atom; 
*asegjptr; 



/* pointer to head of atom array segment 
/* pointer to molecule containing atom seg 
/* maximum number of atom records in seg 
/* number of filled atom records in seg 
/* offset to first filled record in segment 
/* offset to first free record in segment 



Define the bond specifier .records pointed to by the atom records 

typedef struct atom_conn_struct { 

int target; /* offset to target atom 

int bond_rec; /* offset to bond descriptor record 

} atom_ conn, *acon_j?tr; 
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/************************* BOND 
/* 

/* Define the bond entry record 
typedef struct bond struct 



DEFINITION *******************************/ 

*/ 
*/ 



template 



int 


type; 


/* 


bond type 


i32 


status ; 


/* 


bond status 


int 


recno; 


/* 


cumulative bond record number 


int 


id; 


/* 


bond id (logical bond number) 


int 


link; 


/* 


link to empty bond record 


List_Ptr 


property; 


/* 


pointer to bond property list 


List_Ptr 


feature; 


/* 


pointer to list of features including 






/* 


this bond 


int 


o_subst ; 


/* 


offset to origin atom substructure 


int 


origin; 


/* 


offset to atom at bond origin 


int 


t_subst ; 


/* 


offset to target atom substructure 


int 


target; 


/* 


offset to atom at bond destination 


} bond , 


*bond_j?tr; 







/* Define the bond array segment descriptor template 
i typedef struct bond_seg_struct { 



bond_ptr 

mol_ptr 

int 

int 

int 

int 



seg_head ; 
molecule; 
max_bond ; 
nbond; 
used_bond; 
free bond; 



} bond_seg, *bseg_jptr;. 



/* 
/* 
/* 
/* 
/* 
/* 



pointer to head of bond array segment 
pointer to molecule containing bond seg 
maximum number of bonds in segment 
number of filled bond records in seg 
offset to first filled record in segment 
offset to first free record in segment 



*/ 

*/ 
*/ 
*/ 
*/ 
*/ 
*/ 
*/ 
*/ 
*/ 
*/ 
*/ 



*/ 

*/ 
*/ 
*/ 
*/ 
*/ 
*/ 
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/* ===== comfa.h ====== */ 

/* Regions are the set of points at which energy evaluations are made */ 

/* in the CoMFA method of QSAR. A region is defined as the union */ 

/* of a set of 3D boxes (which may be a single point in the */ 

/* limit) and their associated attributes. Attributes needed for */ 

/* CoMFA purposes are outlined below. */ 

/* */ 

/*********************************** **************************************/ 

tfifndef QSAR_COMFA_DEFINITIONS 

#define QSAR_COMFA_DEFINITIONS 1 

tfinclude "ta_types.h" 

#define DUMMY 26 /* dummy atom id */ 

#define LP 20 /* lone pair atom id */ 



typedef enum { 
FDENGYJJNKNOWN , 
FDENGY_ELECT, 
FDENGY_STERIC , 
FDENGY_HOMO , 
iSENGY_LUMO , 
BBCK_ELECT, 
DQCK_STA_NOHB, 
D(|CK_STA__HBD , 
HDCK_STA_HB A , 
D0CK_STB_N0HB, 
Ip©CK_STB_HBD, 
; Ip©CK_STB_HBA } FldEngyTyp; 

S 

t : yg§def enum { 
! F§HD_ORIGINAL, 

F§HD_FFIT, 

|gHD_XTERN, 

E§HD_FUNC, 

#gHD_USER, 

FDHD_USR_AVG, 

FDHD_DOCK, 

FDHD_AVG , 

FDHD_S IG , 

FDHD_MAX / 

FDHD_MIN , 

FDHD_COEFF, 

FDHD_AVG_X, 

FDHD_SIG_X, 

FDHD_FLD_X , 

FDHD_RANGE, 
, FDHD_PLS_XWT, 
: FDHD_PLS_XLOAD, 

FDHD_FAC_LOAD, 
! FDHD_FAC_COMM < 

FDHD_FAC_ROTLOAD , 
! FDHD_S IMCA_LOAD , 
! FDHD_SIMCA_MODEL, 
i FDHD_SIMCA DISCRIM, 

FDHD_HBD J FldHowTyp ; 



f 
I 



typedef struct { 
fpt lo[3], 

hi [3] , 

stepsize [3 
int nstep [3] , 



1; 



n; 



int 
fpt 



atom_type; 
pt_charge ; 
fpt * weight; 
int avg_type ; 
avg_scale; 
arb, 
*parb; 



fpt 
int 



/* 
/* 
/* 
/* 
/* 
/* 
/* 
/* 
/* 
/* 
/* 
/* 
Box, 



corner with lowest values for each axis 

n n hi -est " " " 11 

increment between points 

derived as 1 + (hi-lo + epsilon) / stepsize 

n = product of nstep [i] 
SYBYL atom type, for steric energy computation 
elemental charge at point, for electrostatics 
weight [n] is applied in all computations, e.g=l 
box of 'scale', sphere, sphere x vdw, ...? 
scale whose meaning derived from avg_type 
arbitrary int for later use 
" pointer 11 » 

*BoxPtr ; 



typedef struct { 
char * filename ; 
int n_boxes; 
int n_points ; 
BoxPtr box_array; 
int n_refs ; 
long when_ made; 



/* name of the region's file (if any) 
/* number of boxes which make up the region 
/* number of points in this region altogether 
/* box_array [n_regions] , each one a Box 
/* number of CURRENT references to this memory 
/* creation stamp 



} Region, *RegionPtr 



*/ 

*/ 

*/ 
*/ 

*/ 
*/ 
*/ 
*/ 
*/ 
*/ 
*/ 
*/ 



*/ 
*/ 

*/ 

*/ 
*/ 
*/ 



tyjsfedef struct { 

Mar *reg_name; 
. d&ar *fld_name; 
S=egionPtr reference; 
P4dEngyTyp fid; 
iat num_avgd ; 
iat curr_iter; 
char *mol_id; 



/* name of the region's file (if any) 
/* name of this field's file (if any) 
/* the region referenced by this field 

/* what type of field is referenced here 
/* number of fields averaged into this one 
number of iterations in current field fit 
unspecified molecule id, 
e.g. dbname/molname/ alignname 



/* 
/* 



run 



*/ 

*/ 

*/ 

•*/ 

*/ 

*/ 

*/ 



?§t n_pomts ; 
int zap_el; 
figt max_value; 
|gt *f ield_value; 
fnt n_refs ; 
long when_made; 
int vol_avg_type; 
fpt scale_vol_avg; 
int dielectric; 
int repulsive; 
FldHowTyp how_made; 
} Field, *FieldPtr 



/* 
/* 
/* 
/* 
/* 
/* 



*/ 
*/ 
*/ 



number of points in associated region */ 
whether electrostatics are MISSING when>max_st */ 
largest permitted absolute value of energy 
values at each point of the field 
number of CURRENT references to this memory 
creation stamp 

/* added these 4 items 1/30/89 DEP */ 



/* perry's way = 1 or old way = 0 */ 
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/* molecule dependent information solicited by QSAR table operations, 
passed into COMFA column field evaluations */ 



typedef struct { 
boolean already_f ield; 
char * s ome__name ; 
char *steric_name; 
!char *elect_name; 
FieldPtr sfld_p; 
'FieldPtr efld_jp; 



}; ComfaMol, *Comf aMolPtr ; 



/* whether a field name exists (otherwise alignment) 
/* name of alignment; Nil align==use as is (!) 

name of steric field (if applicable) 

name of electrostatic field (if applicable) 
points to steric field in memory (when there) 
points to elect, field in memory (when there) 



/* 
/* 
/* 
/* 



*/ 
*/ 
*/ 
*/ 
*/ 
*/ 



/* molecule- independent information for CoMFA evaluations */ 
typedef struct { 



int 


vol_avg ; 


/* 


fpt 


vol_scale ; 


/* 


int 


fld_types ; 


/* 


fpt 


steric_max; 


/* 


int 


repulsive ; 


/* 


f fi 


elect_max ; 


/* 




dielectric; 


/* 


int 


elect — out ; 


/* 



0,1, 2=none,box, sphere (0) */ 



case for what fields: 0 , 1 , 2=both, steric, elect . (0) 
maximum steric energy (30) 

steric repulsive exponent - 12, 10, or 8 (12) 
maximum electrostatic energy (30) 
case for dielectric (AS FORCE FIELD TAILOR) 
case to. drop elect inside steric max: 0,1=T,F (1) 

cftir *region_name; /* name of region used in the CoMFA computations 



FilldPtr sweight_fld; /* 
;Fi%ldPtr eweight_fld; /* 
: FldHowTyp how_done; 
; i&t du_lp_steric; 



: i&t du_lp_ elect; 

iSt spare 1; 
iat spare2 ; 



/* 

/* 

/* 
/* 



} ComfaTop, *Comf aTopPtr; 
#endif 



points to MEMORY field for weighting steric PLS 
points to MEMORY field for weighting elect. PLS 

/* perry's way = 1 or old way = 0 */ 
include dummies and lone pairs in steric field 
calculations */ 

include dummies and lone pairs in electrostatic 
field calculations */ 

As of 6.1comfa , this is TAILOR! COMFA! TRANSFORM*/ 
INDICATOR SCALE among other things 



*/ 
*/ 

*/ 
*/ 
*/ 
*/ 
*/ 



*/ 
*/ 
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Section III-B. Functional descriptions of external procedures. 
(Routines that simply return dynamic memory to the heap are not 
described.) 

B OND_ V_ERING - TRUE if bond is in an external ring. 

BOND_V_IRING - TRUE if bond is in an internal (simple) ring. 

QS AR„FIELD_E V AL.GETOFF - provides coordinates for field 
computation when "volume averaging" is being done. 

QS AR_FIELD_ VDWT AB - returns steric parameters for the 
computation of the field contribution from the probe atom and each 
of the molecule atoms. 

SYB_AREA„GET_MOLECULE - returns the internal representation of 
the molecule in some area or "container", if such exists. 

SYB_ATAB_ATOMIC_NUMBER - returns the atomic number of the 
specified atom type. 

SYB_ATAB_ATOMIC_WEIGHT - returns the atomic weight of the 
specified atom type. 

SYB_ATAB_HBOND_ACCEPT - returns TRUE if the specified atomic 
type is a hydrogen-bond accepting atom. 

SYB_ATAB_VDW_RADII - returns the atomic radius of the specified 
atomic type. 

SYB_ATOM_FIND_ID - returns the internal representation of an atom 
referenced by its atom ID number (Atom IDs are guaranteed to be 
continuous but the ID of any single atom may change as atoms are 
added or deleted.) 

SYB_ATOM_FIND_REC - returns the internal representation of an 
atom referenced by its record ID number. '(Atom record IDs are 
invariant but there may be "holes" in their sequence such that the 
largest record ID may be greater than the number of atoms.) 

S YB _ AT OM_FIND_S ET - returns the bitset of atoms corresponding to 
a list of atoms. 
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27 



# 



SYB_BOND_FIND_REC - returns the internal representation of a bond 
referenced by its (invariant) record ID number. 

SYB_BTAB_MNEM_TO_TYPE - converts an ASCII representation of a 
bond type to its internal representation. 

SYB_EXPR_ANALYZE - parses a user-entered ASCII description of 
atoms (e.g., M2(<H>) for all hydrogen atoms within molecule M2) 
into internally valid representations of molecule and atoms. 

SYB_HBOND_DONORS - returns the set of IDs for atoms which are 
hydrogen-bonding hydrogens. 

TAILOR_STORE_IT_HERE - returns the current value of a user- (and 
SPL-) accessible variable. 

TBL_ACCESS_INDEX_TO_COLNAME - converts a user-provided MSS 
column ID to a column name (name is guaranteed to be a unique 
identifier). 

TBL_GRAB_COMPLETE_FPTS - done returning multiple (scalar) values 
in an MSS column to an array. 

TBL_GRAB_GET_FPTS_INV - in a multiple value retrieval, returns the 
value corresponding to a user-provided row ID. 

TBL_GRAB_INIT_FPTS - set up for returning multiple (scalar) values 
in an MSS column to an array. 

UBS_OUTPUT_MESSAGE - equivalent to fprintfO 

UIMS2_VAR_GET_TOKEN - returns the current value of a global SPL 
variable. 

UIMS2_WRITE_ERROR - writes text to the error output stream. 

UTL_FILE_FCLOSE, UTL_FILE_FOPEN - equivalent to fclose() and 
fopen(). 

UTL_LIST_RETRIEVE - returns the next element on a linked list. 
UTL_MEM_ ALLOC - equivalent to malloc(). 



A-38 



UTL_SET_AND_INPLACE > - makes the first set logically equivalent to 
the second set, with only those bits that are also 1 in the third set 
becoming 1 in the first set. 

UTL_SET_CARDINALITY - returns the number of bits that are 1 in a 
particular bitset. 

UTL„SET_CLEAR - sets all bits in the set to 0. 

UTL_SET_COPY_INPLACE - makes the first set logically identical to 
the second. 

UTL_SET_CREATE - creates and returns an empty set of requested 
size. 

UTL_SETJDELETE - sets the specified bit to 0. 

UTL_SET_DIFF_INPLACE - makes the first set logically equivalent to 
the second set, with all bits that are 1 in the third set becoming 0 in 
the first set. 

UTL_SET_EMPTY - TRUE if all bits in the set are 0. 
UTL_SET_INSERT - sets the requested bit to 1. 

UTL_SET_MEMBER - returns TRUE if the requested set bit equals 1. 

UTL_SET_NEXT - returns the identity of the next non-zero bit in a 
set. 

UTL_SET_OR_INPLACE - makes the first set logically equivalent to 
the second set, with all bits that are 1 in the third set becoming 1 in 
the first set. 

UTL_STR_CMP_NOCASE - non-case sensitive version of strcmpQ. 
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APPENDIX "B 



/* CODE. This code implements a PHORE_LOC column type and calculates a single 
cell value (the Hydrogen Bonding Fingerprint for a molecule) within the SYBYL 
Molecular Spreadsheet. It is to be understood that other supporting code handles 
user input, user output, and disk file I/O. */ 

/* data structure for PHORE_LOC column type */ 
typedef 

struct PHORE { 

char *disco_fn; /* user name for DISCO feature file - default 
appears below */ 

int disco_in; /* internal flag if DISCO feature file loaded */ 
char *region_fn; /* user name for defining region file */ 
RegionPtr rgn; /* internal reference to region when loaded */ 
int nfuzz; /* number of extra lattice points (each direction) 

for each PHORE feature */ 

int nbits; /* set length (must agree with rgn contents or EVAL 

fails) */ 

^ PHORE, *PPHORE; 

/*jfe:QSAR_PROC_EVAL_PHORE_LOC */ 

/******************** *****^ j 
/*"^ int QSAR_PROC_EVAL_PHORE_LOC(tablename, row, colname) */ 

*/ 

Dick Cramer 31-Jul-95 (PHORE_LOC == lattice bitset ) */ 

/*!. */ 

/*y This module generates bitsets whose cardinality is equal to */ 

/*fU lattice points x 2 (# of sitepoint classes. For each */ 

/*P instance of a pharmacophore point in the molecule being */ 

/*Ln processed, the geometrically nearest (l+m) A 3 bits in the */ 

/*C3 bitset will be set to 1 (where m is user supplied) . */ 

/*U */ 

/* NOTE: this routine explicitly requires that sets begin after a */ 

/* first element that is the set size!!! */ 



/ 



/ 



*/ 



/* Inputs */ 

/* */ 
/* Outputs */ 



/* 



/ 



/* User Required Definition Files */ 



/ 



* 



* 



/ 



/***********************^ 

/*-E*/ 

int QSAR_PROC_EVAL_PHORE_LOC (tablename, row, colname) 
char *tablename, *colname; 
int row; 



CRAMER, PATTERSON, CLARK, & FERGUSON 



B-1 



{ 

mol_ptr mol; 

PPHORE phr; 

int err, status, nvalid, mol_area; 

char *dum; 

set_ptr print, qsar_proc_calc_phore_set () ; 
FILE *fp; 

/* get the molecule */ 

if ( ! TBL_UTL_GET_MOLECULE ( tablename , row, FALSE, &mol) ) 
{ 

if ( UTL_ERROR_IS_SET ( ) ) {err=l; goto 

error ; } 

else return FALSE; 

} 

/* get the user-provided input data */ 

if ( !TBL_ATTR_FIND_COLUMN_A( tablename, colname, "PR0C_SUPP0RT" , &dum, 

(int *)&phr) ) {err=3; goto 

erxor; } 

/*^retrieve DISCO stuff if not yet present */ 
if ( ! phr->disco_in) { 

r\ if ( !phr->disco_fn) {err=l; goto error;} 
/*^set appropriate tailor value, then initialize DISCO */ 

&! sprintf( str, "SETVAR TAILOR! DISCO! FILE %s", phr->disco_f n ); 

V J UIMS2_EXEC_COMMAND ( str ) ; 

H UIMS2_EXEC_COMMAND( "DISCO INIT" ); 

M phr->disco_in = TRUE; 

> 

/*Uretrieve region if not yet present */ 
fU if (!phr->rgn ) { 

P if ( !phr->region_fn) {err=l; goto error;} 

In if (!(phr->rgn = QSAR_REGION_RETRIEVE( phr->region_f n ) )) 

{ ep:=4 ; goto error ; } 
U if (phr->rgn->n_boxes > 1 ) { 

sprintf( str, "WARNING: Region %s has %d boxes. Only first 

will be used.\n" , 

phr->region_fn, phr->rgn->n_boxes ) ; 
UBS_OUTPUT_MESSAGE( stdout, str ); 

} 

phr->nbits = 2 * phr->rgn->n_points; 

} 

/* evaluate this result, first the DISCO call */ 

if (!( print = qsar_proc_calc_phore_set ( mol, phr, invalid )) ) {err=12; 
goto error;} 

/* go store both the bitset in the MSS H Cell_Support" and the number of bits 
actually set in the "CELL", so there's something for the user to see */ 
if ( !TBL_ACCESS_X_PUT_VALUE( tablename, row, colname, "CELLJSUPPORT" , 

(int *) Sprint) ) {err=ll; goto error;} 
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if ( !TBL_ACCESS_X_PUT_VALUE(tablename, row, colname, "CELL", 

(int *)&nvalid) ) {err=ll; goto 

error ; } 

return TRUE; 

error : 

sprintf (str, "QSAR_PROC_EVALJPHORE_LOC (%d)", err) ; 
UTL_ERROR_ADD_TRACE (str) ; 
return FALSE; 

} 



set_ptr qsarj?roc_calc_phore_set ( mol, phr, nvalid ) 
/* creates actual bitset */ 

mol_ptr mol; 

PPHORE phr; 

int *nvalid; 

{ 

set_ptr anset = NIL, pset = NIL, SYB_FEAT_FIND_ID_SET ( ) ; 
feat_ptr featp, SYB_FEAT_FIND_REC ( ) ; 
atom_ptr a, SYB_ATOM_FIND_REC ( ) ; 

flnt err, elem, sitebase, ci, xybase, boff, lt_base[3], lt_off[3], loff 
O/nhioff = 0 ; 
Nffpt tmp; 
-BoxPtr bxptr; 
£ftine_ptr cdp; 

■_ 3 

'-J 

if (!( anset = UTL_SET_CREATE ( phr->nbits ) )) {err = 1; goto error;} 
U *nvalid =0; 

g if (phr->nf uz z ) { 

g loff -= phr->nfuzz / 2; 

?jj hioff += (phr->nfuzz + 1 ) / 2; 

n > 

bxptr = phr->rgn->box_array; 
™ xybase = bxptr->nstep[0] * bxptr->nstep[l] ; 

/^"generate the DISCO sites for this molecule, which */ 
UIMS2_EXEC_C0MMAND( "ECHO %DISC0_SITES ( ) 11 ); 

/* become "FEATURES" + "dummy atoms" within SYBYL's molecule data 
structure */ 

pset = SYB_FEAT_FIND_ID_SET(mol, FEAT_V_LINE, 1, mol->nfeats) ; 
if (pset ) { 
elem = -1; 

while((elem = UTL_SET_NEXT (pset, elem) ) != NO_MORE_ELEM) { 
if (! (featp = SYB_FEAT_FIND_REC (mol,elem))) goto error; 
if ((featp->name[l] == 'S') && (featp->nalne[2] == '_')) { 
/* have an H-bonding feature, it must represent a line */ 

sitebase = f eatp->name[0] == 'A' ? 0 : phr->rgn->n_points; 
/* the dummy atom at the end of the line is our H-bonding locus */ 
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cdp = (line_ptr) featp->dataptr; 

if (!(a = SYB_ATOM_FIND_REC (mol, cdp->positn) ) ) {err=2; goto 

error; } 

for (ci = 0; ci < 3; ci++ ) { 

tmp = (a->xyz[ci] - bxptr->lo[ci] ) / bxptr->stepsize[ci] ; 
lt_base[ci] = (int) (tmp < 0.0 ? tmp - bxptr->stepsize[ci] : 

tmp ); 

} 

/* cycle through all points touched by this locus that are also within the 
region */ 

for (lt_off[0] = lt_base[0] + loff; lt_off[0] <= lt_base[0] + hioff; 
lt_off [0]++) 

if (lt_off[0] >= 0 && lt_off[0] < bxptr->nstep[0] ) 

for (lt_off[l] = lt_base[l] + loff; lt_off[l] <= lt_base[l] + 
hioff; lt_off[l]++) 

if (lt_off[l] >= 0 && lt_off[l] < bxptr->nstep[l]) 

for (lt_off[2] = lt_base[2] + loff; lt_off[2] <= lt_base[2] + 
hioff; lt_off[2]++) 

if (lt_off[2] >= 0 && lt_off[2] < bxptr->nstep[2] ) { 
boff = xybase * lt_off[2j + 
q (bxptr -> nstep[0]) * lt_off[l] + 

lt_off[0] + sitebase; 
UTL_SET_INSERT ( anset, boff ); 
,1 (*nvalid)++; 
} 

r: } 

^ UTL_SET_DESTROY ( pset ) ; 
!L } /* pset exists */ 

I. -J 

j~ return ( anset ) ; 
error: 

P sprintf (str, "qsar_proc_calc_phore_set (%d) " , err); 
M UTL_ERROR_ADD_TRACE (str) ; 
return FALSE; 

} 



# This file determines the recognition of site points in Sybyl/DISCO. 

# See the SYBYL DISCO manual for detailed documentation. The defined types 
are 



# (1) HB : the QUERY is searched in the SEARCH mode, and all occurences 

# are assigned DISCO features according to the remaining 

# specifications — the three ATOMS refer to the atom number 

# in QUERY such that the feature is DIST from the first atom 

# at bond ANGLE with the first and* second atom at each of the 

# TORSIONS formed by the site point and the three ATOMS in order. 

# A sitepoint of NAME is added at these extension points, 

# — and — the first atom is assigned a feature complimentary 
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to the extension point (such as HBD_CO_ and RHBD_CO_) . 
(2) HBex:differs from HB in that the angles and torsions are replaced 
by two other arguments: whether lone pairs are part of the 
extension point placement, and which ATYPE (generally LP 
and/or H) determine the direction of the sitepoints. 



#TYPE NAME 

HB 
HB 
HB 
HB 
HB 
HB 
HB 



ATOMS SEARCH DIST ANGLE TORSIONS 



QUERY 



4 2 1 NoDup 
13 4 All 2 



DS_02C2_ 
DS_03Car_ 
DS_03Car_ 1 2 
DS_03Car_ 1 3 
DS_03Car_ 1 2 
DS_03Car_ 2 1 
DS_03C3_ 13 6 
0[f]HC(Any) (Any) C (Any) (Any) Any 
HB DS_N3C3_ 14 5 NoDup 2.9 
HB DS 02S 3 2 1 All 2.9 120 



All 2 
NoDup 
NoDup 
All 



2.9 120 "0.0 180.0" HevC(Any)=0[f ] 
9 119 "0.0 180.0" Off ]HC(:Hev) :Hev 
119 "0.0 180.0" 

9 
9 



9 

2 
2 
2.9 



NoDup 2.9 117 



110 
"0 



0[f ]C(:Hev) :Hev 
119 "0.0 180.0" 0[f]HC(=0) 
119. "0.0 180.0" 0[f]C(=0) 
120 "0.0 180.0" C(:0[f ]) :0[f ] 
"60 180 300" 



#TYPE NAME ATOMS SEARCH DIST LP 



"60 180 300" N[f ]H2ZC{Z:C&!C=OS!C:Hev} 
0 180" AnyS(=0) (=0)NH 
ATYPE Query 



HBex DS_03C3_ 2 13 NoDup 2.9 YES "LP H" 
0[fc)HC(Any) (Any) Z{Z:Hev& ! C(Any) (Any) Any} 
HB^Ic DS_03C3_ 3 12 NoDup 2.9 YES "LP" 
HBelc DS_N3C3_ 2 14 Nodup 2.9 "" "H" 
N[ H2 YaZ { Z : Hev& ! C} { Ya : C& ! C=0& ! C : Hev} 
HBe^c DS_N3C3_ 2 13 NoDup 2.9 YES "LP H" 
HBex DS_N3C3_ 3 12 NoDup 2.9 YES "LP" 
N[|fJ] (Ya) (Ya) Ya{Ya:C&!C=0&!C:Hev} 



HBex 


DS N2C2 


2 13 NoDup 


3. 


0 YES "H LP" 


HBex 


DS N2C2 


12 3 NoDup 


3. 


0 YES "H LP" 


HBex 


DS N2C2 


12 3 NoDup 


3. 


0 YES "LP" 


HBex 


DS N2N2 


2 13 NoTriv 


3 


.0 YES "LP H" 


HBex 


DS N2N2 


2 13 NoTriv 


3 


.0 YES "LP H" 


HBex 


DS N2N2 


3 2 1 NoDup 


3 


.0 YES "LP" 


hbfj 


DS 03 S 


3 2 1 NoDup 2 


.9 


128 "0.0 


hbM= 


DS 03 S 


4 2 1 All 2.9 




128 "0.0 180 


hb 


DS 03S 


4 2 1 All 2.9 




128 "0.0 180 


hb 


DS 03N 


3 2 4 All 2.9 




128 "0.0 180 


hb 


DS 02N 


4 2 1 NoDup 2 


.9 


128 "0.0 


hbex 


DS N2N2 


3 2 1 NoDup 


3 


.0 YES "LP" 


hb 


DS 03P 


3 12 All 2. 


9 


128 "0.0 180. 


hb 


DS 03P 


3 12 All 2. 


9 


128 "0.0 180. 


# #CLASSNAMES# Acceptor site 


Donor Atom DL 


HB 


AS H03C2 


13 4 All 2.9 




119 "0.0 180 



Off ] (Z) Z{Z:C&iC=Het} 



N[f]H(Ya)Ya{Ya:C&! C=OS ! C: Hev}. 



Any-N[r]=C[r] 

N[l]H:C:C:N[f ] :C:@1 

N[l]H:C:C:N[f ] :C:@1 

C:N[f ] :Hev 
180.0" HevS=0[f] 
.0" HevS(=0[f ])=0[f ] 

.0" HevS(~0[f ]) (~0[f])~0[f] 

.0" HevN(0[f ])0[f ] 

180.0" HevN(Hev) ~0[f ] 

N:N[f ] :N 
0" P(-O) (~0) (-0) (-0) 

0" P(-O) (~0) (-0) 



HB 



117 "60 180 



AS_H03C3_ 13 6 NoDup 2.9 
0[f]HC(Any) (Any)C(Any) (Any) Any 
HB AS_N3C3_ 14 7 NoDup 2.9 
N[f]H2C(Any) (Any) C (Any) (Any) Any 
HB AS_N3C3_ 15 8 NoDup 2.9 
N[f]H3C(Any) (Any) C (Any) (Any) Any 

#TYPE NAME ATOMS SEARCH DIST LP ATYPE Query 



" 0[f ]HC(:Hev) :Hev 
300" 



110 "60 180 300" 



110 "60 180 300" 
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HBex AS_HN2C2_ 
HBex AS_HN2C2_ 
HBex AS_HN2C2_ 
HBex AS_H03C3_ 2 
0[f ]HC(Any) (Any) Z 
HBex AS_HN2C2_ 3 
HBex AS_HN2C2_ 1 
HBex AS_HN2C2_ 2 
HBex AS_N3C3_ 2 
N[f ]H2C(Any) (Any) 
HBex AS_N3C3_ 2 
N[f ]H3C(Any) (Any) 
HBex AS_N3C3_ 2 1 
HBex AS_N3C3_ 2 1 
HBex AS_N3C3_ 2 1 
HBex AS_N3C3_ 3 1 
HBex AS_HN2C2_ 2 
HBex AS_HN2C2_ 3 
HBex AS_HN2C2_ 2 
HEfiPx AS_HN2C2_ 2 
HBex AS_HN2C2_ 1 
Hffdx AS_HNS3_ 6 
HBiejx AS_HN4_ 2 1 
htSsx AS_HN2N2_ 
h&4 AS_03P_ 3 
htU AS 03 P 3 



1 3 NoDup 3.0 
3 2 1 NoDup 3.0 
6 5 4 NoTriv 3.0 
1 3 NoDup 2.9 



nit iijjii NHC(Any)=0[f ] 
YES "LP H" C:N(f]H:Hev 
YES "LP" N[l]H:C:C:N[f ] :C:§1 

YES "LP H" 



{Z:Hev&!C(Any) (Any) Any} 
2 4 Nodup 3.0 YES "LP H" HevN[f]H=C 
2 3 Nodup 3.0 YES "LP" HevN[f]=C 
1 4 Nodup 3.0 "" "H" N[f]H2C(N)=N 
1 4 Nodup 2.9 YES "LP H" 
Z{Z:Hev&!C(Any) (Any) Any} 
1 5 Nodup 2.9 YES "LP H 
Z { Z : Hev& ! C (Any) (Any) Any} 



it 



NoDup 
NoDup 
NoDup 
NoDup 



2.9 
2.9 
2.9 
2.9 

3. 



3 NoDup 3.0 

2 NoDup 3.0 

4 NoDup 3.0 

3 NoDup 3.0 
3 NoDup 3.0 

5 2 NoDup 3.0 " 
3 NoDup -3.6 "" 
3 2 1 NoDup 3.0 
1 2 All 2.9 128 
1 2 All 2.9 128 



YES "LP H" N(f ]H(Ya) Ya{Ya:C&!C=0&!C:Hev} 
YES "LP H" N(f]H2(Ya)Ya{Ya:C&lC=0&»C:Hev} 
YES "LP H" N[f ]H(Ya) (Ya) Ya{Ya:C& !c=OSlC:Hev} 
YES "LP" N[f] (Ya) (Ya) Ya{Ya:C&!C=0&!C:Hev} 



YES 
YES 

ii ii iijjii 
tin iijjii 
•in ii jjii 
ii "H" 
i'C*" 

YES "LP" 
"0.0 180.0" 
"0.0 180.0" 



"H LP" N[f]H=C 
"LP" N[f]=c~Any 

N [ f ] H2Hev ( : Hev) : Hev 
N [ f ] HHev ( : Hev) :Hev 
HNC=Any 
AnyS(=0) (=0)N[f ]H 
N[f ] (Z) (Z) (Z)Z{Z:C&!C=0&!C:Hev} 
N:N[f]:N 
P(-O) (-0) (-0) (-0) 



P(-O) (-0) (-0) 
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Data Set 


No. Of Cpds 


Structure. Activity 


I ueniing 


9 


camptothecin, DNA fragmentation 


2 Strupczewski 


34 


benzisoxazoles, ip Behavioral 


5 oiddiqi 


10 


adenosines, Brain Al binding 


4 Garrattl 


10 


tryptamines, melanophore binding 


5 Garrattz 


14 


tryptamines, melanophore binding 


6 Heyl 


11 


deltorphin, opioid receptor (DAMGO) 


/ Lnstalli 


32 


adenosines, A2a agonists 


8 Stevenson 


5 


piperidines, NK1 antagonism 


C\ T\ju1-1 .1. nil, » 

y uonerty 


6 


triarylbutenolides, endothelin-A an tag. 


1 /"\ Tin** « <■ m a 

1U renning 


13 


SC-41930 analogs, LTB4 antagonism 


11 Lewis 


7 


oxazolinediones, NK1 binding 


12 Kry stele 


30 


sulfonamides, endothelin-A antagonism 


ij ioKoyamai 


1 i 


oxamic acids, T3 binding 


14 Yokoyama2 


12 


oxamic acids, T3 binding 


15 Svensson 


13 


benzindoles, 5-HTA agonism 


16 Tsutsumi 


13 


peptidyl heterocycles, endopeptidase inhib 


17 Chang 


34 


biphenyl sulfonamides, ATI binding 


18 Rosowsky 


10 


trimetrexate analogs, DHFR inhibition 


19 Thompson 


8 


peptidomimetic, HIV-1 protease inhibition 


20 Depreux 


26 


naphthylethyl amides, melatonin displ. 



Literature References for Data Sets: 

1. Uehling, D.E., Nanthakamur, S.S., Croom, D., Emerson, D.L., Leitner, P.P., 
Luzzio, M.J., et al., Synthesis, Topoisomerase I Inhibitory Activity, and in Vivo 
Evaluation of 1 1-Azacamptothecin Analogs. J. Med. Chem. 1995, 38, 1106 (Table 2, 
with R 2 =Et; IC 50 data. 

2 Strupczewski, J.T., Bordeau, K.J., Chiang, Y., Glamkowski, E.J., Conway, P.G., et 
al. 3-[[(aryloxy)alkyl]piperidinyl]-l,2-Benzisoxazoles as D2/5-HT2 Antagonists with 
Potential Atypical Antipsychotic Activity: Antipsychotic Profile of Iloperidone 
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(HP873). /. Med. Chem. 1995, 38, 1119. (Tables 2 and 3 with n=3, X=0; ED 50 for 
inhibition of apomorphine-induced climbing.) 

3. Siddiqi, S.M., Jacobson, K.A., Esker, J.L., Olah, M.E., Ji, Xi.-duo., et aL, Search 
for New Purine- and Ribose-Modified Adenosine Analogs as Selective Agonists and 
Antagonists at Adenosine Receptors. J. Med. Chem. 1995, 38, 1174. (Table 1, 
R 2 =H; K,(A1), values estimated from % displacement and stereoisomers averaged as 
needed.) 

4. Garratt, P. J., Jones, R., Tocher, D. A., Sugden, D., Mapping the Melatonin 
Receptor. 3. Design and Synthesis of Melatonin Agonists and Antagonists Derived 
from 2-Phenyltryptamines. J. Med. Chem. 1995, 38, 1132. (Table 1 and Table 2). 

5. Garratt, P. J., Jones, R., Tocher, D. A., Sugden, D., Mapping the Melatonin 
Receptor. 3. Design and Synthesis of Melatonin Agonists and Antagonists Derived 
from 2-Phenyltryptamines. J. Med. Chem. 1995, 38, 1132. (Table 1 and Table 2). 

6. Heyl, D.L., Dandabuthla, M., Kurtz, K.R., Mousigian, C. Opioid Receptor Binding 
Requirements for the &-Selective Peptide Deltorphin I: Phe 3 Replacement with Ring- 
Substituted and Heterocyclic Amino Acids. J. Med. Chem. 1995, 38, 1242. (Table 1; 
binding K, to DAMGO.) 

7. Cristalli, G., Camaioni, E., Vittori, S., Volpini, R., Borea, P.A., et al. 2-Aralkynyl 
and 2-Heteroalkynyl Derivatives of Adenosine-5'-N-ethyluronamide as Selective A2a 
Adenosine Receptor Agonists. J. Med. Chem. 1995, 38, 1462. 

8. Stevenson, G.I., MacLeod, A.M., Huscroft, I., Cascieri, M.A., Sadowski, S., 
Baker, R. 4,4-Disubstituted Piperidines: A New Class of NK, Antagonist. J. Med. 
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Chem. 1995, 38, 1264. (Table 1.) 

9. Doherty, A.M., Patt, W.C., Edmunds, JJ. Berryman, K.A., Reisdorph, B.R., et al. 
Discovery of a Novel Series of Orally Active Non-Peptide Endothelin-A (ETJ 
Receptor-Selective Antagonists. J. Med. Chem. 1995, 38, 1259. (Table 3; IC 50 ET A .) 

10. Penning, T.D., Djuric, S.W., Miyashiro, J.M., Yu, S., Snyder, J.P., et al. Second- 
Generation Leukotriene B 4 Receptor Antagonists Related to SC-41930; Heterocyclic 
Replacement of the Methyl Ketone Pharmacophore. J. Med. Chem. 1995, 38, 858. 
(Table 1, all; LTB 4 receptor binding.) 

11. Lewis, R.T., MacLeod, A.M., Merchant, KJ. Kelleher, F., Sanderson, L, et al. 
Tryptophan-Derived NK1 Antagonists: Conformational^ Constrained Heterocyclic 
Bioisosteres of the Ester Linkage. 7. Med. Chem. 1995, 28, 923. 

12. Krystek, S.R., Hunt, J.T., Stein, P.D., Stouch, T.R. 3D-QSAR of Sulfonamide 
Endothelin Inhibitors. 7. Med. Chem. 1995, 38, 659. 

13. Yokoyama, N., Walker, G.N., Main, A.J. Stanton, J.L. Morrissey, M., et al. 
Synthesis and SAR of Oxamic Acid and Acetic Acid Derivatives Related to L- 
Thyronine. 7. Med. Chem. 1995, 38, 695. 

14. Yokoyama, N., Walker, G.N., Main, A J. Stanton, J.L. Morrissey, M., et al 
Synthesis and SAR of Oxamic Acid and Acetic Acid Derivatives Related to L- 
Thyronine. 7. Med. Chem. 1995, 38, 695. 

15. Haadsma-Svensson, S.R., Svensson, K., Duncan, N., Smith, M.W., Lin, Ch.-H. C-9 
and N-Substituted Analogs of cis-(3aR)-(-)-2,3,3a,4,5,9b-Hexahydro-3-propyl-lH- 
benz[e]indole-9-carboxamide: 5HT1A Receptor Agonists with Various Degrees of 
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Metabolic Stability. J. Med. Chem. 1995, 38, 725. 

16. Tsutsumi, S., Okonogi, T. Shibahara, S., Ohuchi, S., Hatsushiba, E., etal, 
Synthesis and Structure Activity Relationships of Peptidyl @-Keto Heterocycles as 
Novel Inhibitors of Prolyl Endopeptidase. J. Med. Chem. 1994, 37, 3492. (Table 2, 
X— C^CH^ICso.) 

17. Chang, L.L., Ashton, W.T., Flanagan, K.L., Chen, Ts.-Bau., O'Malley, S.S., et al, 
Triazolinone Biphenylsulfonamides as Angiotensin II Receptor Antagonists with High 
Affinity for Both the AT, and AT 2 Subtypes. /. Med. Chem., 1994, 37, 4464. (Table 
1, R 3 =(2-Cl)C 6 H 5 ; AT, [rabbit aorta] IC 50 .) 

18. Rosowsky, A., Mota, C.E., Wright, J.E., Queener, S.F., 2,4-Diamino-5- 
chloroquinazoline Analogs of Trimetrexate and Piritrexim: Synthesis and Antifolate 
Activity. J. Med. Chem. 1994, 37, 4522. (Table 2; rat liver IC 50 .) 

19. Thompson, S.K., Murthy, K.H.M., Zhao, B., Winborne, E., Green, D.W., et al. 
Rational Design, Synthesis, and Crystallographic Analysis of a Hydroxyethylene- 
Based HTV-1 Protease Inhibitor Containing a Heterocyclic Pl'-P2' Amide Bond 
Isostere. J. Med. Chem. 1994, 37, 3100. (Table 2, X-Boc; apparent K t .) 

20. Depreux, P., Lesieur, D., Mansour, H.A., Morgan, P., et al. Synthesis and 
Structure-Activity Relationships of Novel Naphthalenic and Bioisosteric Related 
Amidic Derivatives as Melatonin Receptor Ligands. J. Med. Chem. 1994, 37, 3231. 
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APPENDIX "D 



A list of 736 commercially available thiols broken down into 231 clusters based on topomeric 
CoMFA field descriptors along with the systematic name applicable to each. The 231 clusters 
are sorted by proposed name, first by the "root" structure, ie., the fragment attached 
immediately to the -SH, and then by the substitution pattern on that "root" substructure. The 
names describe topologically equivalent hydrocarbons, ie., structures in which all monovalent 
atoms are replaced by hydrogens and the other atoms by carbons. 
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^ t" T*l 1 1" 1 1 Y"ri 1 


rrv 

ID 


Size 


X\OOC 


oUJjSticucion 


1 


Z O 


aryl 


o imp i e 


1 A A 

144 


1 


aryl 




1 / / 


1 


aryl 


z , j # D-jxie— ^t — fr 


163 c 


1 


aryl 


2 , 3- (4- (2 , 3-Pr) 5het ) 5het0 


151 


1 


aryl 


2 , 3 - ( 4 -Bu ) She tO- 5 -Me 


33 


5 


T 

aryl 


2 , 3 -Benzo 


80 


2 


aryl 


2, 5-Me 


192 


1 


aryl 


2 , 5-Me-3-iPe 


7 


14 


aryl 


2 , 6-NoH-3 (4/5) -Me 


27 


6 


aryl 


2 , 6-NoH-3-Ar 


107 


2 


aryl 


2- (2-Bz) PheEt-4, 5-Benzo 


189 


1 


aryl 


2- (3 , 5-Me) Ar-4, 5-Benzo 


141 


1 


aryl 


2- (4-Et) PhePr 


205 


1 


aryl 


2- (4-Stilbenyl)Stilbenyl 


188 


1 


aryl 


2-5hetCH2-4 / 5-Benzo 


56 


3 


aryl 


2-Ar 


138 


1 


aryl 


2-Ar-3 , 5-Me 


190 


1 


aryl 


2-Ar-4 , 5- (3 , 4-Et) Benzo 


41 


6 


aryl 


2-Ar-4 , 5-Benzo 


152 


1 


aryl 


2-Bz 


16 


9 


aryl 


2-Et 


85 


2 


aryl 


2-NoH-3-Et-5-Me 


106 


2 


aryl 


2-PheEt-4, 5-Benzo 


77 


2 


aryl 


2-PhePr 


142 


1 


aryl 


2-R8 


121 


2 


aryl 


2-Stilbenyl 


97 


2 


aryl 


3 , 4- (3-Me) Benzo 


218 


1 


aryl 


3 , 4- (a,b) IndenO 


164 


1 


aryl 


3,4-(a,b, ( 8 -Ar) IndenO) -6 -Me 


98 


2 


aryl 


3,4-(a,b, (c-Me) IndenO) 


99 


3 


aryl 


3,4-(a,b-Naphtho) 


157 


1 


aryl 


3,4-Ar 


58 


3 


aryl 


3 , 4-Benzo-5-Me 


100 


2 


aryl 


3 , 4-Benzo-6-tBu 


37 


5 


aryl 


3, 5-Me 


180 


1 


aryl 


3- (2,3-Benzo-4-Et) 5het 


199 


1 


aryl 


3- (2,3-Benzo-5-Me) 5het 


182 


1 


aryl 


3- (2-Me-3-5het-5-Et) 5het 


115 


2 


aryl 


3-(3-5het)5het 


193 


1 


aryl 


3- (3-Ar) 5het-4-Me 


67 


3 


aryl 


3-Ar 


129 


2 


aryl 


3-Ar-4- (2-Me) 5hetCH2 


46 


4 


aryl 


3-Ar- 5 -Me 


155 


1 


aryl 


3-Bz 


82 


2 


aryl 


3-Bz-5 , 6-Benzo 


10 


16 


aryl 


3-Me 



♦ ♦ 



70 


3 


aryl 


3 -Napncii 


73 


3 


aryl 


3 -Pr-4-sBu-o-ne 


95 


2 


aryl 


3-iPr 


88 


2 


aryl 


4-Ar 


81 


2 


aryl 


4-Bz 


48 


4 


aryl 




2 




aryl 


*± lie 


3Z 


A 


aryi 


*± i\ j < 


n ft 

90 


A 

ft 


aryl 




i ft 
19 


Q 

o 


aryl 




148 c 


1 


aryl 


(adenosine) 


228 


1 


aryl 


( fluorescein) 


12 


10 


5het 


Simple 


50 


4 


5het 


2 , 3 - ( a , b-Naphtho ) 


139 


1 


5het 


2 , 3-5hetO-4-Me 


89 


2 


5het 


'2,3-Ar 


173 


1 


5het 


2- (2, 5-Et) Ar-3-Et 


69 


3 


5het 


2- (2-Me) Ar-3- (2-Me) PneEt 


198 


1 


5het 


2- (2-Me) Ar-3-R10 


174 


1 


5het 


2- (2-sBu) -3-Et 


171 


1 


5het 


2- (3 , 5 -Me) Ar-3 -bnet 


170 


1 


5het 


2 - (3 , 5 -Me) Bz-3 , 4-Benzo 


123 


2 


5het 


2- (3-Et) Ar-3 -Bz 


22 


7 


5het 


2- (4-Et) Ar 


202 


1 


5het 


2- (4-Et)Ar-4- (4-Me) Ar 


122 


2 


5het 


2- (4-iPr) Ar-3-Bz 


197 


1 


5het 


2-5hetCH2-3- (4-tBu) Ar 


6 


14 


5het 


2-Ar 


225 


1 


5het 


2 -Ar-3 - (2-Ar) ShetBu 


224 


1 


5het 


2 -Ar-3- (2-Ar) 5hetCH2 


63 


3 


5het 


2 -Ar-3- (2-Bz) Ar 


178 


2 


5het 


2 -Ar-3- (2-Me) bnet 


72 


3 


5het 


2 -Ar-3- (3 , 4-Et) Bz 


40 


5 


5het 


2 -Ar-3- (3-Ar) SHetEt 


183 


1 


5het 


2 -Ar-3- (3-Ar) PhePr 


64 


3 


5het 


2 -Ar-3- (3-Ar-5-Me) 5net 


105 


2 


5het 


2 -Ar-3- (3-Me) Ar 


160 


1 


5het 


2 -Ar-3 - (4-Ar) Cyrix 


146 


1 


onet 


z — AT J v ft— Ar ; L.yuALnz 


*i r\ "i 

203 


1 


onec 




XZ 0 


9 




2 -Ar-3 - ( tBu) Ar 


1 / 


Q 




9 -Ar-3 -Ar 


211 c 


1 


C "U ^ 4- 

bnet 


& — Ar — j - Dcazy i xuene 


124 


2 


bnet 


z Ar j inaeutni 


28 b 


6 


5het 


2 -Ar-3 -Me 


30 


6 


5het 


2-Ar-3-PnePr 


204 


1 


5het 


2-Ar- 5- (4- (2, 4-Me) Bz) Ar 


79 


2 


5het 


2-Bz 


78 


2 


5het 


2 -Bz-3 , 4-Benzo 


117 


2 


5het 


2-Cyhx 


186 


1 


5het 


2-Cyhx-3 / 4-iPe 


68 


3 


5het 


2-Et 


112 


2 


5het 


2-Et-3- (2-Me) PheEt 



i)-3 



♦ ♦ 



128 


2 


bnet 


93 


2 


bnet 


61 


3 


bnet 


181 


1 


bnet 


49 


4 


C Vnat- 
DneT- 


86 


2 




91 


2 


bnet 


4 


17 


bnet 


172 


1 


bnet 


38 


5 


bnet 


13 


10 


bnet 


222 


1 


bnet 


66 


3 


bnet 


29 


6 


bnet 


71 


3 


bnet 


108 


2 


onet 


127 


2 




54 


3 


bnet 


221 


1 


briet 


187 


1 


bnet 


143 


1 


bnet 


96 


2 


bnet 


162 


1 


P" L — . 

5het 


169 


1 


5het 


94 


2 


bnet 


210 


1 


bnet 


36 


15 


bnet 


176 


1 


bnet 


196 


1 


bnet 


159 


1 


bnet 


42 


4 


bnet 


200 


1 


bnet 


113 


2 


bnet 


125 


2 


5het 


191 


1 


5het 


145 


1 


5het 


114 


2 


5het 


18 


8 


5net 


59 


3 


bnet 


65 


3 


bnet 


24 


«"7 

7 


bnet 


A A 

44 


0 


JilC L- 


bz 


c 
3 




111 


Z 






1 

X 




32 b 


6 


PL A t 

bnet 


223 


1 


PL A l. 

bnet 


185 


1 


bnet 


34 


5 


alJcyl 


104 


2 


alkyl 


62 


3 


alkyl 


3 


18 


alkyl 


14 


9 


alkyl 



2 -Me- 3 , 4- ( 3 -Me) Benzo 

2 -Me- 3 , 4 -Benzo 

2-Me-3- (2,3,4-Me) 5het 

2-Me-3- (2, 3-Benzo-4-Et) 5het 

2-Me-3-(3-Ar)5het 

2-Me-3- (3-Ar) 5hetPr 

2-Me-3- (3-Ar- 5 -Me) 5het 

2-Me-3- (3-Bz)Ar 

2-Me-3- (4-tBu)PheEt 

2-Me-3-5Het 

2-Me-3-Me 

2-Me-3-Pe 

2-Me-3-PheEt 

2-Me-3-PhePr 

2-Me-3-R8+ 

.2-Me-5-Bu 

2-Pe-3-Ar 

2-Pr 

2-R12 

2-iBu-3,4-iPe 

2- iPe-3 , 4-Benzo 
3,4- (2,4-Me)Benzo 
3 , 4- (3-Ar) Benzo 
3,4-(3-Hx)Benzo 

3 , 4- (3-Pr) Benzo 
3,4- (a,b-Napththo) 
3 , 4-Benzo 

3- (2,4-Me)Bz 
3-(3,5-Me)Ar 
3- (3-Ar) 5het 
3-(3-Bz)Ar 
3-(3-Me)PheEt 
3-(4-Me)Ar 

3- (4-tBu)Ar 

3- (Al-4-Et) PheEt 

3- (B-Ar) PhePr 

3-5hetCH2 

3-Ar 

3-Ar(2-thia) 
3-Bu 

3-Me-5-H 

3-Me-5-NoH 

3-Pe 

3 -PheEt 

3-PhePr 

3-Pr 

3-R13 

(chrysenO) *- 

Simple 

(3) (Bl) (Bl) 

(3 -Me) PhePr 

(3:4) 

(3:4) (Al) 



♦ ♦ 





60 


3 


alkyl 


(3 : 4) (Bl) 




226 


1 


alkyl 


(4) (Al) (A-tBu) (CI) (CI) 




45 


4 


alkyl 


(4) (Dl) (Dl) 




35 


7 


alkyl 


(4 -Me) PnePr 




168 


1 


alkyl 


(4-xPe) PnePr 




47 


4 


alkyl 


(5) (Al) 




179 


1 


alkyl 


(5) (Bl) (E- (2-Ar-5-Me) oriet) 




103 


2 


alkyl 


(5) (B3) 




76 


2 


alkyl 


(5) (CI) (CI) 




83 


2 


alkyl 


(5) (C2) 




216 


1 


alkyl 


(5) (C2) (D2) (D2) 




43 


8 


alkyl 


/C ^\ /n1 / T3 1 /pi \ 

(5:6) (Dl/ ol/ r X) 




5 


15 


alkyl 


(5:7) 




158 


1 


alkyl 


(6) (B8) (CI) (El) (El) 




140 


1 


alkyl 


(6) (F-Ar) 




166 


1 


alkyl 


-(7) (A8) (Fl) 




53 


3 


alkyl 


(7) (D3) (D3) 




207 


1 


alkyl 


(8) (C3) 




8 


13 


alkyl 


(8 : 11) 




206 


1 


alkyl 


(3) (B4) (G3 ) 


4? 


7b 


3 


axKyx 


/1A \ /"PI \ I \ /pi \ 
\L\J) \-DXJ V " J- / 




136 


1 


axKyx 


VXUJ \E>3) \-c»z; 




z U 


Q 
O 


axKy x 


vox; 


:. : 


3 y 


n 
i 


alKyl 


/I u\ f Rl ^ 

v x x t* / v oi ; 




154 c 


1 


alkyl 


(12 ) (A-PneEt ) 




230 


1 


alkyl 


(12) (F6) (Fl) 


a. 1 

r~ 


131 


2 


alkyl 


(12) (F6) (F6) 




15 


9 


alkyl 


(12 + ) 




13 / 


JL 




V x J ; \ iLft ; 


Q 




1 

X 


alkyl 


(A-Ar) (A-Ar)Bz 




99 Q 

A A j 


1 

X 


alkvl 


(A-Bz) (A-Bz)PheEt 


f ^ 


184 

x o ** 


1 


alkyl 


(Al) PheEt 


: fs 


997 c 


1 

X 


alkyl 


( cholesterol ) 






X 


alJCyl 


\li y jJ luLC/ 




23 


/ 


axicyx 


rilcDU 




*7 A 


3 


axKyx 


irneciu 




25 b 


6 


alkyl 


PhePr 




11 


10 


benzyl 


Simple 




102 


2 


benzyl 


2, 4, 5-Me 




57 


3 


benzyl 


2 , 4, 6-Me 




217 


2 


benzyl 


2- (3- (2-Et) Ar) Ar 




213 


1 


benzyl 


2-Et-3- (2, 3-Et-5-Me) Ar-5-Me 




212 


1 


benzyl 


2-R8-3-Naphthyl-4 , 5-Benzo 




9 


13 


benzyl 


2/3 -Me 




84 


2 


benzyl 


3 / 4-Benzo 




13 2 


z 


Denzyx 






13 U 


z 


Denzyx 


J \ ft O L 1 1 JJCIljf. X / QLllWCllJf X 




1 T / 

13 4 


z 


oenzyx 


ft ^ J rix ; Ai 




0 1 
Z X 


•7 

/ 


Denzyx 


ft IjL 




26 b 


6 


benzyl 


4 -Me 




156 


1 


benzyl 


4-PhePr 




201 


1 


benzyl 


4-tBu 




135 


2 


alkenyl 


Ar. . (2-Et)Ar - 



220 


1 


alkenyl 


Ar. . (4-Bz)Ar 


116 


2 


. alkenyl 


Ar. .Ar 


133 


2 


alkenyl 


Ar . .Bz 


110 


2 


alkenyl 


Et . CN . C0NH2 


87 


2 


alkenyl 


NH2 . CN - N— NPil 


lift 


Z 


a 1 T^^t^"\^1 




X £t VJ 


2 


alkenvl 


P (Pr) 3 . . Ar 


X X o 


2 


alkenvl 


P (iPe) 3 . . Ar 


R1 


4 


alkenvl 


PCyhx3 . . Ar 


1 o c C 


X 


aiKenyi 






6 


alkenyl 


PEt3 . . Ar 


194 


1 


alkenyl 


PEt3 . .Bz 


109 


2 


alkenyl 


PheEt . CN . C0NH2 


101 


2 


cyclohexyl 


Simple 


149 


1 


cyclohexyl 


l-Me-2 , 4-CMe2 


55 


3 


cyclohexyl 


2,3,4, 5-iBu 


147 


1 


cyclohexyl 


2,3, 4-iBu-5-iPe 


209 


1 


cyclohexyl 


2- (3 , 4 -PheEt) bnet-o-Me 


208 


1 


cyclohexyl 


2-Me-3 , 5-CMe2 


167 


1 


cyclohexyl 


2-Me-4-sPe 


165 


1 


cyclohexyl 


2-iPr-3 , 5-Me 


150 


1 


cyclohexyl 


3-sPe-6-Me 


161 


1 


cyclohexyl 


4-Et-4-iBu 


219 


1 


cyclohexyl 


(complex) 


17 5 


1 


eye 1 open tyl 


2-Ar-4-spiro 


215 


1 


cyclopentyl 


3-PhePr 



a To generate these names, all heteroatoms are first replaced bv 
carbon (to produce the simplest common topology) and a particular 
structure is chosen from among these topologies as the "most typical" 
of that cluster, if possible to contain the largest substructure that 
distinguishes that cluster from all others. 

Within the name of a substitution, numbers indicate positions when 
substitution is on a ring, but chain length when substitution is on a 
chain (numbers separated by a colon indicate a range of chain 
lengths). Also, within a chain, letters indicate a position of 
substitution. (For example, (C2) describes a two atom branching from 
the third position of a chain, while 3-PhePr describes a phenyl 
propyl skeleton attached to the 3-position of a ring. ) 

A dot notation (.) separates the three possible substituents on an 
alkenyl root, the substituent order being same carbon as the -SH 
substituent, then the position trans to the -SH, and finally cis to -SH. 

The above notwithstanding, any name enclosed completely in 
parentheses takes its usual structural meaning. 



t * 



Here are structural descriptions for each name abbreviation in the 
above table, mostly in SLN (SYBYL Line Notation), listed 
alphabetically. (SLN extends SMILES with the following concepts, 
among others. Hydrogens are explicit. Ring openings and closures 
begin with a number enclosed by [] and end with the matching 
number preceded by @. Other SLN symbols used in these SLN 
definitions are: ~ = any bond; - = single bond (used here to provide a 
reference for [R]) : = aromatic bond; ! = the SLN following (here in 
parentheses) is not allowed; [F] = no additional atoms may be 
attached to the preceding atom; [!R] = preceding bond may not be in 
a ring; [R] = preceding bond must be in a ring.) 

5het = 5Het = C[1]:C:C:C:C:@1. alkenyl = C=C. alkyl = C~[!R]C. aryl = 
Ar = Phe = Ph = C[1]:C:C:C:C:C@1. benzyl = Bz = HSC-[!R]C~[R]C. Bu = 
C-[!R]C-[!R]C-[!R]C-[!R]C. cyclohexyl = Cyhx = C[1](-I=)C~C-C~C~C~@1. 
cyclopentyl = C[1]~(-I=)C~C~C~C~@1. Et = C-[!R]C. inden = 
C[1]:C(~C~X~[2]):C(~@2):C:C@1. iBu = C-[!R]C-[!R]C(-[!R]C)-[!R]C. iPe = C- 
[!R]C-[!R]C-[!R]C(-[!R]C)-[!R]C. Me = C. naphth = 
C[1]:C(~C~X~[2]):C(~@2):C:C:C@1. NoH = !(CH). O denotes ring fusion, 
e.g., benzo fuses a 6-membered aromatic ring. Pe = C-[!R]C-[!R]C- 
[!R]C-[!R]C-[!R]C. Pr = C-[!R]C-[!R]C-[!R]C. R# = alkyl chain of 
approximate length #. Simple = !(C~[!R]C). sPe = C(-[!R]C)-[!R]C-[!R]C- 
[!R]C-[!R]C. Stilbenyl = C=[!R]C-[!R]C[1]:C:C:C:C:C@1. tBu = C(-[!R]C)(- 
[!R]C)-[!R]C. 



